search this blog

Monday, August 24, 2015

Pre- and Post-Kurgan Europe

The Principal Component Analysis (PCA) below is based on four sets of D-statistics. The second image shows what they are and how they affect the components. The datasheet is available here. If you don't know what EHG, SHG and WHG stand for, see here.

Note that the post-Kurgan Europeans are shifted east, towards the Bronze Age steppe groups (most of which are in fact classified as Kurgan cultures), relative to the pre-Kurgan Europeans. Coincidence? Certainly not. Interestingly, the West Asians show a similar shift to the east, although it's not yet clear who caused it and when.

In this analysis I used samples from the Allentoft et al., Haak et al. and Lazaridis et al. datasets, all of which are publicly available. The latter two are found at the Reich Lab site here.

Update 12/09/2015: Matt posted these graphs in the comments. The first graph shows Yamnaya-related ancestry proportions for a series of points along the Yamnaya-Middle Neolithic continuum, which can be used to estimate Yamnaya-related admixture in samples that cluster near these points.

See also...

Smarter than the average bear


«Oldest   ‹Older   201 – 248 of 248
Mike Thomas said...

See what you make of it. I'll be curious, but I haven't looked extensively into R1b..

Shaikorth said...

Maju, that stuff you linked is about runs of homozygosity (RoH), not genomewide raw homozygosity. In the latter Basques and Sardinians are more homozygous than North Europeans (including Finns and Balts), while other South Europeans aren't (it seems Kristiina linked the genomewide homozygosity numbers already), as you can test yourself on the Human Origins or HGDP data if you want, the most homozygous will be Basques and Sardinians followed by Lithuanians etc. Basques also have the most long RoH's among the European samples in the study you linked, not just short ones.

Besides high homozygosity Basques also have low internal divergence as do Sardinians. Both in fact rank even lower than Lithuanians in that regard. This indicates very high in-population IBS sharing and not just recent endogamy which, like homozygosity, is also consistent with their isolate status. See:
(suppl. table 2)

There are correlations with latitude and NW Europeans are more heterozygous than NE Europeans, but that rule can't be generalized to isolates. You're right in that Basques are more heterozygous than East Asians, but so are all Europeans. This is also true when it comes internal divergence which is smallest in East Asians and Native Americans. But that's expected, diversity decreases when moving away from Africa.

Maju said...

@Grey: To me it seems apparent that the Anatolian IEs arrived in the region probably with Kura-Araxes (Maykop-related), along with maybe other non-IE groups (??), and that they were lurking at the Eastern edge of the Hatti country until they could grab power. Outright simple conquest is not that common, there's almost always diplomacy, mercenariate, more or less stable alliances with local groups, and other types of hubrys involved. The conquest of the Hatti probably involved several steps until the IE elite became consolidated. In turn much of the Hattian culture became part of the Hittite one and even Hattian language was still used for some time. We cannot just discern all the details of the process because some key data is missing.

"I don't understand why it's strange. How are the massive technological changes that occurred in this era: wheel, metallurgy, domesticated horses etc, going to spread?"

By normal internation communication and exchange? In many cases it may well be the case indeed. First we cannot consider early pre-bronze metallurgy as anything complicated at al, so it was easy to reproduce. Allegedly some Bell Beaker communities or individuals may have attempted or even achieved to establish some sort of quasi-monopoly on late copper metallurgy but earlier it was very apparently a much more informal and decentralized activity. Bronze tech very probably scattered after the IE invasion of the Balcans, where it is first attested well BEFORE them. These invasions may have caused a scatter of refugees, some of which were surely expert coppersmiths and goldsmiths. In any case it's something we see spawning all around normally well before any IE intrusion (or in some areas after but without relation with them anyhow). I don't think there is any particular mystery to bronze metallurgical scatter: it happened just as if it was spread by mere contact and trade, with only rare exceptions in Siberia maybe, where it can be in few cases related to Seima-Turbino invaders. Victims of persecution, less excellent smiths or others in search of opportunities would find patrons in the remote areas where bronze smelting was still in the beginnings. Those were qualified workers, well paid probably but not rulers nor conquerors nor had any particular influence other than serving their lords or their communities if still democratic.

As for horses they had at least two origins (Western Steppe and Iberia) and again they would just spread by contact often, being sold or even stolen as a commodity (just used as livestock often). Chariots (two wheeled war chariots, heavy four wheeled ones are not even probably IE by origin) were quite obviously spreading at a later date and again we see that the design was soon adopted by non-IE peoples like Egyptians, etc. out of mere convenience.

Even steel, which was initially kept a state secret by the Hittites, eventually just spread around rather in the opposite direction of conquests and very fast, so it was spread by merchants and smiths in search of a patron, apprentices traveling to foreign lands to learn new skills, spies even.

Onur said...

I compared RISE504 with RISE554 from Iron Age Afontova Gora, which according to Genetiker is the same N as IR1, and I see that they are similar. RISE504 is more South Asian and Chinese and RISE554 is more Siberian and Native American. Western Eurasian hunter gatherer ancestry is the same.

RISE554 was found in the same place as RISE553 (yDNA R1a1a1 without further subclade), and they were almost contemporary, c. 1000 BC. Kytmanovo RISE504 is more than thousand years younger. Maybe RISE553 and RISE554 were proto-Turkic speakers, although RISE553 seems to be related to Sintashta as he is the only one to have ENF of these three.

Well, to make firm suggestions about the location of Proto-Turkic speakers during the Late Bronze or Early Iron ages I would first want to see a broader range of sample results from the same era.

Aram Palyan said...


That destructions happened also in South Caucasus. Starting from 2500 BC. I can't tell the exact scale of that process but it is obvious that after this events a new culture appears there.

But another explanation is possible. It is possible that infiltratiosn of new people started much earlier. At some point when their number reached a critical number a power struggle starts with cities burning. After the succesfull power seizing the Hittite empire starts.

Transcaucasia was the heartland of the Kura-Araxes, or Early Transcaucasian culture, which holds an important place in the culture history of eastern Anatolia. The transition from this Early Bronze Age culture to the more fragmented regional cultures of the Middle Bronze Age remains poorly defined. The transition is marked by a shift away from fairly autonomous village life, the appearance of evidence for enhanced social hierarchy, and the first use of tin-bronzes in Transcaucasia. Traditional chronology places the transition at the end of the third millennium B. C. However, radiocarbon evidence indicates a mid-third millennium date for the transitional cultures, thus aligning Transcaucasian developments more closely with those in eastern Anatolia and northwestern Iran (late Early Bronze Age) and in Ciscaucasia (Maikop). Transcaucasia seems to have continued to play an important interregional role even after the disappearance of the Kura-Araxes cultures.

Simon_W said...

@ Chad

I don't agree that the Afroasiatic expansions were the main factor that changed West Asia from the EEF-related state to the modern one. The only Afroasiatic expansions in West Asia were the Semitic ones. There were no Afroasiatic peoples in West Asia other than the Eastern Semites (Assyrians, Babylonians), the Western Semites (Aramaics, Phoenicians, Hebrews) and the Southern Semites (Arabs). And as this list shows, their center of weight was rather southern. Which makes sense given their connection with North and East African languages. Hence the Semitic expansions in West Asia in general should have increased the similarity to Saudis and Bedouins. The latter however are the best living proxy for the ENF people in the K8 model, they have rather low ANE. EEFs and Sardinians are somewhere inbetween Saudis/Bedouins and WHG. However, what changed West Asia most of all, was an ANE shift which shifted the whole populations eastwards. This can't be explained with Semitic expansions. And anyway, Semites didn't play a crucial role in Anatolia. There were Assyrian traders, sure, but this wasn't a mass migration.

Also, the Iron Age Armenian isn't Balkan-like, he's far from it. He's just shifted slightly towards Balkan populations, relative to the Bronze Age Armenians.

Matt said...

Shaikorth: There are correlations with latitude and NW Europeans are more heterozygous than NE Europeans, but that rule can't be generalized to isolates.

Yes, there's a few different measures you can look at for diversity within Europe. You guys probably know all this, but in case anyone else doesn't and I haven't garbled it too much (corrections welcome):

+ Homozygosity - On an individual SNP-by-SNP level, do members of the population tend to carry two of the same variant at a given SNP loci?

+ ROH - Runs of Homozygosity - Do individuals from a population have runs of multiple SNPs in a region across the genome in which they have two copies of the same variant? This relates to SNP level homozygosity, but weakly - long ROH relate more to recent close breeding, while outbreeding breaks up ROH.

+ HD - Haplotype Diversity - Considering a set of SNPs together, linked to other SNPs, how many combinations are they found linked together in a population? IRC this has some correlation to ROH and homozyosity, but is also largely independent measure. Might indicate relatively small numbers of founders for a pop.

+ Private alleles - Variants found only in one pop or another. Tends to need whole genome sequencing to actually find these guys, as the common panels of SNPs sampled will miss them, as private alleles are at totally random mutation positions in the genome, not any of the neutral drift affected SNP locii sampled to get information about human population relationships. So the kind of samples we normally get do not tell us anything about this.

Homozygosity probably has a link to farmer-vs-HG ancestry, as it seems like farmer populations were just more diverse on a SNP level. For whichever reason, admixture, or out of africa patterns of serial bottlenecks or higher long term population size. So you would expect to find NW populations to be generally slightly more homozygous than NE, all being equal, as it looks like there are slight differences in HG vs farmer ancestry, although these are quite low and really these seem more north associated than W-E, as we've discussed upthread.

For ROH, as a measure that's not so relevant unless any population has a recent history of close breeding, which is more the case for small islands and hilltop villages and such, not large nations, even nations as large as the general population of Sardinia (some of the small villages and communities tapped by the HGDP and others for particularly isolated and unadmixed samples of Basques and Sardinians may be a little different).
When looking at haplotypes, haplotype diversity was lowest within Europe in NW Europe in Auton 2009 -

That seems plausible. But that's quite old (by fast moving standards), not sure what the current state of play would be with a potentially better sample set (NW here was basically Britain+Ireland, while NNE Europe was a combined sample of "Czech Republic, Denmark, Finland, Hungary, Latvia, Norway, Poland, Russia, Slovakia, Sweden, Ukraine", due to low levels of samples from each of those countries).

Low haplotype diversity *will* tend to increase IBD (identity by descent) / IBS (identity by state) of members of a population with one another, as linkage (haplotype) information is taken into account by these measures (although IRC, again IBD / IBS are not pure measures of haplotypes, and that is something specific of its own).

However, comparing ancient and modern populations, this should not affect the closeness much, as haplotypes are broken up naturally over time, into totally unlinked sets of SNPs. IRC again though, there is some debate over the rates compared to the theoretical expectation, as there is some sharing of longer haplotypes than would be expected between say La Brana and populations from the region where he lived.

In any case HD, like ROH, should not really affect measures based on unlinked SNPs (typical PCA or clustering based on unlinked SNPs) very much.

Maju said...

@Saikorth: I meant to reply to you yesterday night but my connection got so horrible that it became impossible. Lost the whole text. Basically I meant that the signal of recent endogamy is something you see clear in Muslim peoples notably and that nothing of the like is apparent among Basques or Sardinians at all, only a very weak signal that should be considered normal because of the well known relative isolation of these peoples.

Also re. East Asians, they are notable at the Medium ROH in a homogeneous way through the whole region (suggesting a regional bottleneck AFTER the one of Basques), otherwise The Basque figures are almost identical (in the short and long ROH scores) to those of the Dai (i.e. not notorious).

The table 2 of the study you link to is nearly impossible to read but does seem to suggest that Sardinians are indeed slightly less internally diverse than Lithuanians. However Lithuanians are relatively "southerner", so whatever: does not confirm your point.

Anyway, I finally found a reference I was looking for for NE peri-Arctic peoples of Europe, in which the raw endogamy is measured (ROH), and Finns from Helsinki, as well as all other Far North populations are extreme: with values (at the non-asterisk columns) that are at best almost double than those for Estonians and, at worst, six times higher. So if Sardinians are like Lithuanians or Estonians, they are still much much less endogamous than Far North (North of St. Petersburg) populations.

And this is indeed important, even if the analysis does not allow us to qualify the periods of endogamy in those remote areas of Europe.

Shaikorth said...

My point was about raw homozygosity and intra-population diversity, not RoH's. These are not the same thing. In raw homozygosity and reduced internal diversity, for which numbers have been posted already, Sardinians and Basques exceed Northeast Europeans, be they Balts, Finns, Slavs or even Komis.

Regarding endogamy, RoH-wise Sardinians and Basques are extreme, both more endogamous than Helsinki Finns (who are a mixture of West and East Finns, latter of which have more RoH), Orcadians and even the Ashkenazi and Sephardi populations! They are less endogamous than Kuusamo Finns, but that's a small isolate village, like Northeast Italian isolates, not an ethnicity of its own. Here's a direct comparison including Sardinians and Basques.

Re: Balts, Estonians and Komis, this study, as well as the one you linked, uses different samples for thema than the ones publicly available from Estonian Biocentre and Human Origins set of Reich lab. The latter are much smaller samples and unfortunately the only ones available for public use (comparisons with ancient genomes, homozygosity and whatnot).

Shaikorth said...

Esko et al. also estimated inbreeding coefficients for a high number of European populations, Sardinians and Basques are right there with Kuusamo Finns. The Helsinki sample is in the same range as Estonians, discrepancy with average RoH numbers probably reflecting the fact that it's a mix of West and East Finns.

All this reinforces the point that isolates are exceptions to normal geographical clines of endogamy. Recent endogamy is a bit besides my original point about raw homozygosity though.

Maju said...

Larger ROH figures mean greater homozygosity, not less! So your graph actually shows that Finns(I) in the left graph are clearly more homozygous than Basques and Sardinians, and Finns(G) in the right graph are quite more homozygous than most Europeans (what was to be expected anyhow). Some pops. like Orcadians do not behave as in other previously commented data but rather appear quite homozygous here, about the same as Finns. I wonder why but anyhow just a minor note that does not alter the substance of our discussion. Obviously isolated Alpine populations, which are the focus of the study, are very homozygous.

Maju said...

"Recent endogamy is a bit besides my original point about raw homozygosity though".

It's not beyond the point IMO, because if in the last millennia there has been no particular endogamy all the argument is rendered invalid. The only thing that seems to stand in all that noise is some ANCIENT founder effect or bottleneck, and that is very different in effects like drift and all that.

Shaikorth said...

"Larger ROH figures mean greater homozygosity, not less! So your graph actually shows that Finns(I) in the left graph are clearly more homozygous than Basques and Sardinians, and Finns(G) in the right graph are quite more homozygous than most Europeans (what was to be expected anyhow)."

Larger RoH figures mean more recent endogamy. It is not raw genomewide homozygosity and you should not keep confusing the two. In raw genomewide homozygosity (figures posted by Kristiina) Basques and Sardinians exceed all Northeast Europeans.

Essentially the figure I posted means that on average Basques and Sardinians are quite endogamous (since their RoH exceeds Jews) on top of that. The very high inbreeding coefficient supports this.

Maju said...

No, Saikorth, the raw length of ROH only means raw homozygosity. Only the details like those mentioned in Razib's entry (and its referenced paper) in which the various lengths of the segments is considered separately can discern between recent and older generated homozygosity (this is because of the well known phenomenon of gradual fragmentation of chromosomal segments via recombination). But your graph did not refer to any particular segment length, just to overall or raw overall added length of the ROHs, i.e. raw homozygosity without any further qualification.

"Essentially the figure I posted means that on average Basques and Sardinians are quite endogamous (since their RoH exceeds Jews)"...

What?! Ah, OK, I realize now that the right figure is a zoom of the lower corner of the left figure. So, let's be clear then: Basques and Sardinians are more homozygous there than Finns (G) but less than Finns (I). Per the legend: "In population names: I, a more homogeneous sub-population; G, a more general sub-population". This can be indeed a source of confusion, also with Basques, because it will depend on what subpopulation they are using as sample.

Maju said...

Anyhow, raw ROH or raw homozygosity does not directly indicate "endogamy". Thanks to the article of Razib (and his source, which I did not check) we know that Basque homozygosity is mostly caused by an ancient bottleneck so the term "endogamy" does not apply. In other cases it is less clear for lack of such detailed data.

Shaikorth said...

No, Maju, raw length of RoH means the combined length of RoH segments, not raw homozygosity. Raw homozygosity is the percentage of SNP's that are homozygous. Conceptual example: "AG-CC-AG-TT" would not count towards RoH because no RoH segment is that short, but counts towards raw homozygosity.

So I restate: while the graph indicates Basque/Sardinian RoH is more than Finnish Helsinki sample (G), Orcadians and Ashkenazis but less than Palestinians, Italian isolate villages or the Finnish isolate village of Kuusamo (Finns I), it does not tell us raw homozygosity. For that, numbers have been posted before in this comment section. Inbreeding coefficient and RoH are both indicators of endogamy. In Razib's article Basques' long RoH numbers are the highest in Europe exceeding Orcadians, who are a relatively recent (~1000 years old) isolate by the way, so it's not just the short RoH that makes them stand out.

Of course the endogamy of isolates isn't necessarily the same thing as that of Arabs etc, at least in the case of FVG villages, Kuusamo and islands like Orkney and Sardinia it's more a matter of small number of founders and isolation causing accumulation of RoH and an increased inbreeding coefficient over time. The ultimate effect on genome is similar which is why the term is used in this context.

Grey said...


"By normal international communication and exchange?"

Yes, which would include artisan / trader families from the origin region moving along trade routes.

Normally this wouldn't have much of a demographic impact (and in most places R1b populations are small minorities) but under specific circumstances it might depending on the trade good.

The reason for considering it is it might solve the riddle of some of those early breakaway IE languages.

Maju said...


"No, Maju, raw length of RoH means the combined length of RoH segments, not raw homozygosity. Raw homozygosity is the percentage of SNP's that are homozygous".

It is the same thing. Think about it please.

The combined length of homozygous segments is an absolute number and the other is a relative (perecent) number but, as human chromosomes have a fixed length, one equals the other.

It's like saying (a) 100,000 EU citizens and (b) 20% of EU citizens. Same thing.

Maju said...

Erratum: 100 million, not 100,000 (ahem).

Shaikorth said...

They're not the same at all. A RoH segment has a number of consecutive homozygous SNP's, raw homozygosity is the percentage of individual homozygous SNP's. There are homozygous SNP's outside RoH segments. This is very basic stuff, no reason to keep confusing the two.

Maju said...

@Saikorth: if there are discernable homozygous SNPs outside of ROH segments, then we are talking of the shortest possible ROH segment: the "atom" of genetics. If they are so extremely dispersed through the genome that they do not even constitute segments anymore, they can only be considered hyper-mega-short ROHs: a residue from extremely ancient bottlenecks.

I don't know for sure what percentages of the homozygosity scores they represent, probably not too large, but in any case they do not represent recent endogamy at all. For most purposes they should be considered noise, not relevant information.

Shaikorth said...

Of course genomewide homozygosity is relevant. It tells us how drifted and diverse a population is (Native Americans have the highest genomewide homozygosity etc.). In Europe Basques and Sardinians have the highest homozygosity and lowest diversity isolate villages excluded, being more extreme in these regards than the Komi sample of Estonian Biocentre for instance. This can be verified by comparing the genomewide homozygosity and within-population IBS sharing of HGDP populations, 1000genomes set, EBC's public data and the Human Origins set. For those interested, this is also why MA-1 gets extreme drift parameter in Treemix and such, low coverage makes the genome appear extremely homozygous, much more so than Karitiana and certainly much more than the boy actually was when alive.

RoH segments are consecutive homozygous SNP's and indicate more recent endogamy, and individual homozygous SNP's aren't RoH segments. I won't bother with explaining that again here since my original point was about genomewide homozygosity.

Maju said...

@Saikorth: the original point was not made by you but by Kristiina and was this:

"Maybe it is the isolation that has slowly eliminated other y-lines in small communities without much foreign input."

For what it is very different if we are talking of recent or ancient isolation, particularly because what she was implying is that it was recent (post Iron Age) isolation which has caused the peak in R1b among Basues. This drift hypothesis does not work on light of the RoH data: if anything it is an ancient bottleneck issue.

Shaikorth said...

Two points: I did not say the R1b-related founder effect in Basques needs to be post Iron Age, just post I-E arrival, and Basques have kept being an endogamous population post-Iron Age as well, the study comparing HGDP pops says they have more long RoH's than Orcadians so there is data support.

High homozygosity and low diversity is a good indicator of longer-term population size, but it has other effects beyond the aforementioned Treemix issues as well, for instance it makes populations more likely to create their own PCA dimensions on genotype-based PCA's.

Maju said...


1. Post IE arrival in this part of Europe is Iron Age. There were no IEs of any sort before the late Urnfields/early Hallstatt so far west. Nor so far south (i.e. Catalonia, Languedoc) before the latest part of the Bronze Age (mainline Urnfields).

2. "it has other effects beyond the aforementioned Treemix issues as well, for instance it makes populations more likely to create their own PCA dimensions on genotype-based PCA's".

I'm aware: not just in PCAs but also in ADMIXTURE, etc. However it is not the same if that is caused by recent endogamy or because an ancient bottleneck. In the latter case, it would seem more justified because after all we are looking for that kind of ancient branching, right?

Maju said...

Or to be more clear:

1. The first Indoeuropeans of any sorts (plausibly Celts at least partly) appear in the upper Ebro c. 1000 BCE, that is some 3000 years ago, at the beginning of the Iron Age, still carrying an Urnfield cultural package but later absorbing Hallstatt influences as well. From the North they only appear since La Tène, what is significantly more recent. Incidentally the Western parts of France, indoeuropeanized only at about the same time as the Atlantic Islands (La Tène), are also high in R1b (Western subclades, mostly S116). We are talking of extremely late indoeuropeanization that only happened a few centuries before the Roman conquest.

2. You are conflating data and twisting the argument beyond recognition to make things appear what they are not: trying to make what looks very much as ancient (Neolithic??) bottleneck as recent endogamy.

Shaikorth said...

There is no conflation here, Basques have both low diversity and high raw homozygosity indicating long-term small population size and drift, and also long RoH segments (compared to Orcadian isolate for instance) indicating they've been an endogamous population until recently. As for the R1b could have been accumulated at any point after arrival of the IE's. To confirm things like non-IE neolithic R1b of L151-type (Y-Full MRCA less than 5000 years ago), never mind its descendants like S116 near Atlantic coast we'd need ancient DNA. I'll change my mind about IE-relatedness of S116 happily if it's found in Neolithic populations of West Europe, but I expect that what R1b they will find belongs to non-L151 clades, perhaps V88 or other more uncommon R1b types that Sardinians have.

If you want to discuss Basques' R1b bottleneck and its dating specifically, I'd like to know subclade distribution under S116. YFull will be of more help then.

Kristiina said...

Maybe I had better not to continue, but anyway, Maju, I did not imply that R1b is recent (post Iron Age) among Basques. I said previously that maybe L51 was involved in introduction of agriculture to the Western Mediterranean. Instead, I wanted to say that maybe Basques were yDNA-wise more diverse before but drift has resulted to the accumulation of R1b-L51. In any case, the isolation of Kuusamo Finns is very recent, as they have been in that village only for c. 300-400 years, and there may have been a few Saamis in that area before them. The population density is not very high in that area.

It is said that in some mammals, there are only a few fathers. Wikipedia tells us that (and this is in fact quite amusing) “In hierarchical social animals, alphas usually gain preferential access to food and other desirable items or activities, though the extent of this social effect varies widely by species. Male and/or female alphas may gain preferential access to sex or mates, and in some species only alphas or an alpha pair is permitted to reproduce.

Alphas may achieve their status by means of superior physical prowess and/or through social efforts and building alliances within the group.

The position of alpha also changes in some species, usually through a physical fight between a dominant and subordinate animal. Such fights may or may not be to the death, with relevant behavior varying between circumstance and species.”

This kind of behavious is typical of chimpanzees which are our close relatives.

Maju said...

@Kristiina: appealing to chimpanzees (and forgetting the equally close bonobos, who are matrilocal and almost matriarchal) in order to "explain" human genetics left me totally open-mouthed, sorry. I did not expect that at all.

Mind you that a chimp alpha male is built largely on his cliqué of beta males who also get share in the spoils. In fact even "gamma" males have offspring, just that behind the tree when the big ugly guy is not looking. That's why their mating acts are short (penile spine), their balls big, their penises short, totally unlike humans (also female genitals are very different too). There are a many many books and studies dedicated only to explain that humans are not chimps at all when it comes to sex and that human sexuality is mostly oriented to somewhat stable couples and not mere quick mating when the heat comes, leaving raising children only to the females - and, very loosely, the group of males overall, because any baby could be yours, and that's a reason why females mate with nearly everyone, so all males feel attached to the newborn equally.

The pure "harem" style of alpha-maledom does not belong to chimps, nor macaques (whose society may be comparable)... you have to go to strict vegetarians like gorillas or geladas. However even these patriarchs are still all the time fearful of losing females to competitors, either one by one (gorillas) or in bulk (geladas). And it's the girls who decide invariably.

Anyways, all that, be them chimps, gorillas or the always ignored bonobos (who have the most fun and are the most sensitive ones) are not humans and their sexuality is not quite like ours. If anything it'd be the bonobos the ones closer, because, like us, their females do not openly show their heat, so they can have good relations with every male (sex for them is like for us smiling or shaking hands, almost) and then maybe a bit more picky when getting pregnant without anybody noticing but herself.

In human societies the couple is always fundamental, even if it is often unstable. Sequential monogamy is the general norm, although it's true that occasionally polyamory happens too. Harems proper only exist in clearly hierarchical patriarchal societies and affect only a tiny fraction of the population: it's not standard in any way but limited to very rich or powerful men. Nothing in the anthropological literature suggests that Basques had that kind of anomalous marriage structure but rather the opposite: Humboldt, who documented the matter before dogmatic Christianism took over the country with the Carlist Wars, talks of a very modern way of relations, with open pre-nuptial relations followed by freely chosen stable monogamous marriages.

Kristiina said...

I put that text there because it is amusing and often comments also on this blog testify to that kind of ideas. Can't you see an analogy between IE replacement theories in a huge area ranging from Spain to India and "gaining preferential access to food and mates and other desirable items or activities in a hierarchical society". Many of you claim that Yamnaya R1b and Corded Ware R1a fathered most of the Western Eurasians only in 2000 years' time.

I know that humans have their own pattern of relationships between males and females which is unique to us, but still it looks like drift always eliminates ylines in a group if there is no foreign input, and sometimes, if there is foreign input, it works to replace previous yDNA.

Maju said...

Patrilocality alone explains many of those cases on its own: women's lineages are constantly reshuffled, while men's ones much less commonly. In patrilocality, which seems very common, communities are structured around groups of brothers and paternal cousins, who obviously share the same lineage, barring the oddball adoption. For whatever is worth chimpanzees are also patrilocal.

Additionally it's clear that technically speaking the limit of children a woman can have may be around 30 (in extreme cases: I remember a woman from Peru who claimed to have birthed more than 30), while the extreme cases of men can be well over the hundred (obviously with many different women) and in pure theory it has nearly no limit (assuming an indefinite number of fertile consorts). But in practice the limit is probably around 150 or 300, judging on the most extreme known cases (who were all very powerful men, and exceptional in this matter even among their peers of all times). So there is indeed a potential for gender bias on this matter and some of it is probably happening continuously, even in monogamous social environments via infidelity. But in general it is a slow process of accumulation of this gender bias in drift and I understand that patrilocality is much more responsible for the phenomenon you describe: women move around, men stay in the clan. It's the clan rather than the individual man who creates the effect. Of course that does not happen today anymore so much because now kinship has become "eskimo" (nuclear families, ambilocality, etc.) but it used to happen in the past, also favored by smaller population sizes which favored founder effects (sometimes confusingly described as "bottlenecks").

capra internetensis said...

Matrilocality is common enough to be worth considering though. In North America there were many matrilocal Neolithic or sedentary Mesolithic societies with matrilineal clans. They were by no means peaceful (quite the opposite!) or matriarchal, but they did tend to have a relatively higher status for women. In northwestern Canada (where I am from) free women were treated like dirt among the mobile foraging Athapaskans, but fairly decently among the complex Northwest Coast peoples (though the latter had a lot more slaves). There are still paternal ties, of course, often cemented by cross-cousin marriage.

Austronesian languages in Eastern Indonesia and Melanesia are correlated with female rather than male lineages, interestingly.

Many North American societies were also big on adopting members of rival tribes into their societies - especially children and young women, of course, but sometimes even adult men. Sometimes as semi-slaves, but typically as full members. Because population growth rates were so slow adopting rivals was a very effective way of increasing the manpower of the tribe. There is some evidence of this kind of thing in the Old World too - apparently the Slavs did the same thing.

Maju said...

Very much in agreement, Capra. However matrilocality seems mostly a Neolithic development and tends to be rarer than patrilocality, at least that's my impression.

I agree even more with what you say about generalized adoption, and that also happened even with whole tribal units (Iroquois are known to have offered defeated tribes to join their league or leave the land). We are considering actually a more "advanced" historical stage when we go into Chalcolithic, which is the stage that corresponded for example to the major pre-Columbian civilizations, among others. Naturally when the Incas or the Aztecs expanded their empires, they did not kill or expel everybody else, not at all, they could not and did not make any economic sense; rather they made unequal alliances with other tribes as vassals or semi-autonomous subjects. Much of the same we see with the Roman expansion, etc. And this is likely that such thing happened in all or most farmed lands even much earlier, mutatis mutandi.

capra internetensis said...

@Mike Thomas

Here's what I know about R1b - which is enough to say we don't really know that much.

R1b-M343 (P25 is an unstable marker, known to be bad for 10 years, and it isn't clear whether the R1b1-L278 level even exists). Split ~22 000 years ago.

I. R1b-PH1165 - Bhutan, Tajikstan, India, Xinjiang. We know almost nothing about this.
II. R1b-L754 - unites V88 and L389. Split ~17 000 years ago?

A. R1b-V88 – most common in Central/North/Western Africa (especially around Lake Chad), also found in the Levant (especially around the Dead Sea), southern Italy, European Jews, rarely elsewhere – an Iberian Neolithic man ~7000 ybp - most of this has not been resolved to subclades, so its structure and distribution is poorly understood. Split 7300 (5800-9100) years ago according to Y-Full, but I doubt they have an adequate sampling.
1. One (the more common) type of Sardinian V88 is marked by PF6361 - it is almost certainly equivalent to (or at least overlapping with) the R1b-M18 found in Sardinia, Corsica, and Lebanon.
2. The other type is R1b-V35, and it is a sister branch to one known African V88 sample. However, other African V88, including the common subclade V69, is not well resolved.

B. R1b-L389 - P297 and miscellaneous. Split ~16 000 years ago?
1. R1b-L389* - Puerto Rico, Italy, Peru, East European Jews, West Asia somewhere, probably Spain, Transcaucasus, Levant. Appears to be rare or absent in North Africa. Again not much is known.

2. R1b-P297 - split ~13 000 years ago?
a. R1b-M73 - Turkmen, Hazara, Altaians, Nogays, Uyghurs, Kazakhs, Bashkirs, Mongols, etc. Also sporadic occurences in N China, Tibet, Middle East, Europe. The Samara forager 7500 ybp had (pre-?)M73. TMRCA 7300 ybp.

b. R1b-M269 - TMRCA ~6400 ybp.
i. R1b-PF7558 - it seems that most if not all M269*(xL23) falls into this clade. Balkans (prob. most common here), North Africa (esp. Egypt), Turkey, Iran, European Jews, Bashkirs; also a little in Italy, Northeast Caucasus, Central and Eastern Europe, and the Levant. TMRCA ~5000 ybp.
ii. R1b-L23 - L23* has been reported in a Yamnaya man and IIRC a modern Komi. TMRCA ~6200 ybp.
a. R1b-Z2103 - most common south of the Caucasus, around the Urals, in Anatolia, and in Dagestan, but also found at reasonable levels in Eastern Europe and with low frequencies all the way to Western Europe, Arabia, Pakistan, and beyond. The dominant lineage of Yamnaya men, c. 5000 ybp. TMRCA ~6200 ybp. Not getting into subclades, but basal branches can be found in Dagestan, Pakistan, and other far flung places. Also you can go three levels down with the same TMRCA of 6200 ybp. Other upper-level clades have TMRCAs around 4000-5000 ybp.
b. R1b-L51 - L51(xL11) is found throughout Europe at low frequencies,
but also rarely occurs in Iran and Turkey. Some or all of this is R1b-Z2111, TMRCA 5100 ybp. Almost all L51 falls under L11. TMRCA of L51 is 5800 ybp, of L11 4900 ybp. From there you can go down several levels in numerous upper-level branchings around 4900-4500 ybp.

So there was a gradual series of splits, which tell us very little, up until the rapid expansion of Z2103, which was coincident with the original break up of L23 and just after that of M269. But the massive expansion of L11 and its subclades (and the break-up of PF7558, Z2111, and some Z2103 subclades) did not occur until ~1000+ years later.

R1b-M335 - Turkey, Germany, Yunnan (Hui), Kashmir - very rare, position unclear, it might be under PH1165 or L754.

There is also other R1b* everywhere from Turkey and Iran to Tibet and even Bali, but especially in Central Asia. Little or none in Africa, Siberia, and the Caucasus, however.

Maju said...

We don't know much, Capra, but one thing we don't know for sure is whether those or any other TRMCAs are any more real than any arbitrary figure I can come come up with. Don't feed us up with guesstimates that have nearly the same scientific value as dating the world based on biblical genealogies.

And something we do know and that you do not mention is the structure under L11, which you are totally ignoring. Actually it seems that there are two types of discourse re. R1b: those who focus on R1b in bulk and trivialize M412/L11 (incl. the vast majority of people with this lineage) as a mere marginal branch and those like myself who focus on the subclades of L11 and thing of everything upstream deep and obscure UP stuff, not too informative, and almost as little related to L11 as anything else under F or K2 is.

Obviously the truth is somewhere in between but one thing is clear your TRMCAs are not acceptable upfront.

capra internetensis said...


I was replying to Mike T, who was talking about the earlier structure of R1b. He left off a couple of the lesser-known but no less important early branches. I agree that the structure of L11 (and Z2103) is important in itself, but that is not what we were talking about.

I don't intend the TMRCAs from Y Full as gospel, but as relative points indicating the quiet periods and major expansions in the tree. That is why I only gave the point estimates (which are mathematically certain to be inaccurate) and not confidence intervals or disclaimers about changing mutation rates. When there is no star-like (multi-level) expansion the coalescence dates have little meaning anyway, they are just a function of chance and population size (unless you are doing some kind of complicated coalescence simulation with adequate demographic data, which obviously I am not attempting).

I do think the dates are probably in the right ballpark but they may well be wrong. If you are right and they are no good then we will find out soon enough, because we will find ancient DNA with markers dating much earlier than they ought to. If not - well, people will continue to use them and you will have to decide when to change your mind.

Maju said...

I was just reconsidering this PCA, as well as other similar ones that have been published in the last two years, in the context of another discussion on the origin of specifically Basques. And I could not avoid going back to the issue of the distortion between PC1 and PC2 apportions, because if the apportion would be made realistic (vertical axis should be x32, right?), almost only PC1 matters. And what happens then that Basques, French and English (among others) almost overlap, while Spaniards become much more distant from Basques than it appears in the published version.

It also becomes apparent that Basques, and by extension all those that fall in the same horizontal or quasi-horizontal axis of PC1 scores (French, English, etc.) have a pre-IE baseline that is very similar to Spain_MN (Chalcolithic El Portalón). Instead Spaniards and Italians have a pre-IE baseline that is very similar to Spain_EN (i.e. Cardium Pottery early farmers from Catalonia, etc.). Ötzi and even modern Sardinians.

Of course, I'm aware that this is just one PCA and that there are others but, focusing on this one, these are tentative conclusions that I would like to extend for further consideration.

Davidski said...

You're still confusing two issues here.

- the high level of similarity between Middle Neolithic Western/Central Europeans and the Eastern Europeans who replaced/absorbed them, which is what the low relative value of PC2 shows

- the fact that Middle Neolithic Western/Central Europeans were in large part replaced by Eastern Europeans during the Eneolithic/Bronze Age, which is what stretching PC2 to fit geography demonstrates

So what you need to understand is that despite the fact that Middle Neolithic Western/Central Europeans were very similar to Eneolithic/Bronze Age Eastern Europeans doesn't mean they weren't in large part replaced by them.

Maju said...

Don't get me wrong, David, I'm not trying to ignore PC2 or diminish the importance of "Kurgan" admixture (that would be another discussion and we already went through that part). I just mean to emphasize important details about PC1 and the very large differences between the various pre-Kurgan baseline populations, which are seemingly reflected in modern ones too. Regardless of the exact amount of Kurgan-like admixture, it's obvious that the pre-admixture baseline was different in the West-North than in the South. There is even a relatively thick empty horizontal band in your PCA between those two baseline sectors.

Matt said...

Maju, not sure if you're still reading this topic, but I thought it might be interesting to post up a clustering perspective on these stats -

As clusters should represent the differences without any question of how each PC is being weighted visually.

Choice of 5 clusters for the K means on the above is pretty arbitrary, here is K means 2, 3, 4 -

Maju said...

@Matt: I'm still subscribed, yes. And I do appreciate a lot your statistical data. The 4 cluster data (or even the 3 clusters one), strongly supports my notion of two distinctive baselines for pre-IE Europe and my impression that modern Europeans still largely cluster within those two pre-IE baseline sets.

On one side there is cluster 2, which we can call Atlantic Neolithic, which gathers Basques, French, Northwestern and Central Europeans. On the other it is cluster 3, which I'll call EEF or First Neolithic, that includes other Iberians, Italians and southern Balcanic peoples. It was obviously once important in Central Europe (LBK) but not anymore; on the other side the EEF component may have made some gains in Iberia.

The Kurgan component (cluster 4) looks somewhat different and unable to gather modern Europeans under its wing. This does not mean it was not influential but surely not as dramatically influential as suggested by Haak et al.

As for the Vucedol (Hungary_BA) affinity with Basques I'm still very perplex. I'd be more comfortable if it'd be from Baden culture but Vucedol?!

Would you care to write an entry form my blog on these stats? I'd love to have them publicly referenced. It does not need to be anything overly complex, just stating the "materials and methods" would be enough to provide the necessary context, although feel free to extend. If so, drop me a line at lialdamiz[AT]gmail[DOT]com.

Matt said...

Thanks for the offer, really appreciate that. Just a personal thing but not sure I'm totally confident doing that. Ultimately these are just putting graph and stat functions from the Past3 software over the dataset David's produced. I will think it over though.

Still, with those K means clusters here's a couple of goes at producing them laid over. The clustering is quite sensitive to the populations included though, although I think your ideas make sense:

PCA of all four stats -

And bivariate distributions between pairs of EHG:WHG stats and pairs of averaged MA1-EHG:WHG-SHG stats : / IMO this is clearer than the PCA about what each axis really represents and the scale of each axis (unlike with the PCA where there is the question of how to properly scale the different axes), and is OK when these variables have the kind of patterns they show.

Thinking more, the patterns are very close to what are seen in the genotype / IBD based PCA, but surely must have been reinforced by correlated regional drift patterns between different groups of Neolithic and post-Neolithic populations (for one example as might be observed, the Spanish populations here seem less clustered and more overlapping with other South European populations than in usual in genotype PCA, because it is only giving weight to how related they are by these stats to the HGs and not how related to one another they are).

Maju said...

Well, Matt, I can only insist and offer my technical support as "editor". The materials are interesting and should provide people with yet another interesting viewpoint or two on this complex issue, which we are all struggling to grasp. It should encourage meditation and debate.

Davidski said...


Spain_MN isn't a Chalcolithic sample, it's the Middle Neolithic sample from Haak et al.

Hungary_BA isn't Vucedol, it's Vatya and other Hungarian EBA and MBA cultures.

You need to get these things right, otherwise you just end up confusing yourself.

And how did you work out that Matt's cluster analysis contradicts the estimates of Yamnaya-related ancestry proportions in Haak et al?

Maju said...

Haak et al. and some other authors following the same convention call "Middle Neolithic" to the early and middle (pre-CW or pre-BB) Chalcolithic. For example Baalberge, but actually every single MN sample is early Chalcolithic. It's an English-inspired nomenclature that clashes with the well established conventions of mainland Europe. The traditional English view was that there was no Chalcolithic in Britain and Ireland because they had not found any copper objects (what is not a condition for continental prehistorians, which define Chalcolithic usually based on social complexity, not metals) but anyhow now we know there was also at least some copper in "Neolithic" Britain. I stick to the continental nomenclature, yo do as you wish.

The samples are indeed from Haak & Lazaridis from a site called La Mina. Not sure where it is but 41N, 2W should be around Soria, i.e. not too far from Atapuerca conceptually. They are dated 3900-3500 BCE, what IMO is Chalcolithic per the modern chronologies or at the very least LATE Neolithic. Wikipedia considers European Chalcolithic to begin c. 3500 BCE, so guess LATE Neolithic can be used in this case but not "middle" (unless you stick to the English conventions, what is what these authors apparently did).

Matt said...

Hmmm.... I don't think this is really much in contradiction with Haak as such, exactly.

I think this way of looking at things is that it emphasises what Haak did not emphasise as much - that the process of HG admixture in both East and West Europe was a progressing increase over time.

This of course was there in Haak, in terms of the paper did talk about shifts in relatedness to HG in Europe before Yamnaya and their modelling did tend to need fairly large slices of WHG ancestry to make modern North-Central Europeans work. However the modelling may have looked like de-emphasising the changes in MN Europe via setting the WHG and LBK related ancestry as separate components, when in reality they may have long since progressively admixed, which gave a visual impression of Yamnaya as a single largest Neolithic and post-Neolithic ancestral population (even though there is an experimental reason for this), etc.

In terms of difference, I do think with these stats though you would probably expect a bit more of a substantial estimate of WHG or a different pattern of WHG ancestry in MN / pre-Yamnaya Chalcolithic European populations than the Haak modelling seems to have produced, just comparing the positions here with Haak's Figure 3 (the Baalberge / Spain_MN difference seems as dramatic as the Spain_EN - Baalberge one, for ex).

Also if you were modelling modern Europeans' positions on these PCA as a mix of the ancient populations, to try and get the right mix to fit their position, then you would also expect slightly higher shares of MN populations than the Haak modelling has produced and lower shares of Yamnaya - For instance compare the qpAdm Haak modelling and it estimates around 48%ish Yamnaya in Czechs or 40%ish in English while this would estimate more like 35% Yamnaya. Or in Basques the Haak modelling gives around 22% while this would estimate more around 10% for Spanish Basque.

Some of this would come down to language and interpretation. If the Haak paper talks about a massive migration as a title, then actually shows that this happened through an 80% replacement that when down to around 40-48% replacement in Central Europe, and then something else (if correct) might revise down to 35% (with an equal or greater autosomal change in the opposite direction in Russia shown by the new data from Sintashta), then, actually, not much has really changed in terms of the numbers.

They are very comparable I think, once you remove some of the questions of emphasis. It depends on the different merits of using the outgroup populations to "see" the levels of ancestry via qpAdm, with the confound that connections through these outgroups could maybe be noisy or uninformative even with a lot of stats vs directly comparing with just the HG affinities alone (which potentially has other confounds).

Davidski said...


This is a nice graph.

How did you work out the ancestry proportions?

Matt said...

I just tried guessing at the different proportions to get some examples close enough to the modern day pops to intuit the general trend. The 50:50 Baalberge:CW mix surprised me by being close to on the button for where the English dot was. The combinations generally ended up pretty close, although there is a little difference in PC3 (1% of variance, where PC1 is 88.2% and PC2 10.3%) where the real Europeans are a bit more MA1 and WHG related than the proxy combinations and less EHG related, but it's a really small dimension in comparison.

In theory someone could use a residual fitting / best fit function on either the PCA dimensions or the raw D stats to try and fit, which would be more rigorous? Or just draw lines between the points for populations (Yamnaya/CW to MN) on the PCA and use that to guide estimates. I guess you could more or less fit many of the West Asians as Druze / Iraqi_Jew plus Yamnaya that way as well (although they might not fit well in dimensions that wouldn't show up in these stats?).

I did sort of mess up in my last post with the Basque Spanish estimate by confusing their dot with the Hungary_BA one, so the Basque Spanish would fit more like 17:83 Yamnaya:Spain_MN on these measures.

«Oldest ‹Older   201 – 248 of 248   Newer› Newest»