search this blog

Sunday, August 28, 2016

Ancient vs modern day West Eurasian variation

The Principal Component Analyses (PCA) with ancient samples that I post on this blog are amongst the most accurate and best examples of their kind that you'll see anywhere. That's not just wishful thinking; it's a fact.

My PCA don't suffer from projection bias or shrinkage, which is a handicap of PCA in many ancient DNA papers, and they're run only on observed (rather than imputed) genotypes.

However, even my PCA are far from perfect, because they're based entirely on present-day variation. In other words, I still project the ancients onto eigenvectors computed with modern day reference samples. I guess that's the equivalent of putting the cart before the horse, when originally the horse may have been a donkey, or something like that.

Nevertheless, it's the only sensible way to plot heavily degraded ancient samples with a lot of missing data. But it does often leave me wondering whether the output says anything useful about the ancient world?

Thanks to the recent release of a lot of fairly high quality ancient genotype data from West Eurasia (most of it freely available at the Reich Lab website here), I can now test how well my trademark PCA of ancient West Eurasia reflects reality.

Below are two PCA featuring ancient composite samples. The first PCA is based on ~650,000 SNPs, with 100% call rates in each of the composites. For the second PCA I pruned the markers to correct for LD or linkage, and also made sure that about half of the SNPs were from transversion sites, which are less likely to be affected by postmortem damage. That left ~125,000, hopefully relatively high quality, SNPs.

Obviously, the plots are very similar, which makes me wonder whether there's any point thinning the markers when running decent quality ancient sequences? The datasheets are available for download here and here.

Now, below is a recent example of my PCA of ancient West Eurasia. It's basically almost identical to the plots above. This is very cool, and also very important, because it means that my strategy for running PCA with ancient samples produces solid and relevant results.

Interestingly, on closer inspection, the distance between the western and eastern Neolithic farmers on the first two plots appears bloated. Conversely, the distances between the northern Hunter-Gatherer (HG) samples are somewhat reduced. Any ideas why?

Update 31/08/2016: Open Genomes generated a 3D plot based on a new PCA datasheet that I posted in the comments. Click on the image below to check it out.

Update 01/09/2016: I added present-day samples to the PCA. Very happy with the outcome. The relevant datasheet is available here.

See also...

Ust'-Ishim man x2


Davidski said...

I'll also post a plot with both the ancient composites and modern individuals, but I first gotta turn the modern samples into pseudo-diploid sequences like the ancient composites.

Unknown said...

Cool stuff! Here is a somewhat radical idea for an analysis: Multidimensional scaling with weights permits a PCA-like analysis (particularly in 2 dimensions) for both the ancient and modern samples at the same time. It will give weights to the two or more dimension for each set of data, as well as offering 2-D plots in the same analysis. I've used SPSS for years to do just these analyses, but I'm sure other programs are out there as well. Any chance you could do a compound MDS analysis on the ancient vs the modern samples?

Chad said...

Any chance you could add in a merge Boncuklu and Tepecik sample? Possibly a Natufian one too?

Davidski said...

Can't run Boncuklu or Tepecik like this. Nowhere near enough data. I can probably run Natufians though.

I'll post a new datasheet soon with as many ancient composites as possible, and a bunch of modern samples, once I reduce them to pseudo-diploid sequences.

I encourage everyone to post the results of their experiments with the datasheet here.

Project "Magnus Ducatus Lituaniae" said...

Did you mean to say "pseudo-haploid" instead of "pseudo-diploid"?

Davidski said...

I reckon I mean pseudo-diploid. I first saw Nick Patterson using the term "pseudo-diploids" in some e-mails to refer to the ancient samples that Broad MIT/Harvard are generating with their enrichment capture methods.

Project "Magnus Ducatus Lituaniae" said...

I see your point. Patterson was talking about calling pseudo-diploid haploid genotypes from pileup, otherwise genotype-calling software would produce haploid calls (due to uncertainities of genotype-calling in ancient DNA)
However, the genomes of modern (living) individuals are generally diploid. So, in ohter words, do you want to randomly pick one allele variants and make a pseudo-diploid individual from a real diploid invidual

Project "Magnus Ducatus Lituaniae" said...

"This is ‘pseudo-haploid’ data resulting from a random allele calling procedure, except for Loschbour, Ust_’Ishim and Stuttgart, which are diploid. Contemporary populations are from Lazaridis et al., 2014 (82)
(labeled ‘Lazaridis’), and consist of diploid data. Finally, samples labeled ‘PAT_gt’ are samples we called with our genotype caller described in Supplementary section.
We see the same trend of reduction in Neanderthal ancestry with time as observed by Fu et al. 2016 (Fig. S22). We obtain good agreement of inferred ancestry proportions between samples that are both in the original Fu et al. dataset and were re-called with our genotype
caller, including the ‘pseudo-haploid’ Fu et al. samples without heterozygote sites."

Davidski said...

OK, that's interesting, although I'm not sure why they're referring to diploid sequences made from single haploid sequences as pseudo-haploid, when these sequences are in fact fake (ie. pseudo) diploid sequences?

In any case, it looks like we're talking about the same thing.

Unknown said...

Have you done any dstats with both true diploid, as well as the pseudo diploid (one strand mirrors the other) of the same ancient to see whether D changes from one to the other, just like IBS where the pseudo diploid version of the ancient scores lower similarity to the modern than the true diploid version.

Davidski said...

Formal stats don't care whether the sample is diploid or pseudo-diploid. They produce essentially the same results either way.

But I wouldn't recommend mixing diploid and pseudo-diploid samples in datasets used for IBS.

If you want to run IBS tests on pseudo-diploid ancient samples, and also accurately compare them to modern samples, then turn the modern samples into pseudo-diploid sequences as well.

But this has to be done carefully, with each individual's alleles sampled randomly to create the pseudo-diploid tracks. That's because if you sample the same alleles in a whole bunch of samples, you'll create patterns in samples from the same or even similar groups, and you don't want that.

Unknown said...

By mirrors, I meant missing calls on a strand are set to the same value as the corresponding call on the other strand for that position, making the entire genome homozygous throughout

Unknown said...

Thanks, I sort of suspected it, and good point with IBS

Davidski said...

Yes, I know.

I split several high coverage ancient genomes into halves that I then turned into pseudo-diploid sequences and ran them with Admixtools to see what would happen. I saw very similar stats and Z scores.

Karl_K said...

I think people usually just call it pseudo-haploid when they just take one of the alleles, and if a genome is low coverage, then it is pseudo-haploid at nearly all sites with data anyway.

"If you want to run IBS tests on pseudo-diploid ancient samples, and also accurately compare them to modern samples, then turn the modern samples into pseudo-diploid sequences as well."

This technique might prevent random bias when dealing with large groups, but it essentially could remove genuine relationships in the process.

By following that method, you are only comparing regions where two samples are both homozygous and the same. Any modern or ancient sample with a lot of heterozygosity will then make fewer matches with anyone else.

There is no good way around it though, except for getting higher quality ancient sequences.

The best modern samples to use with ancient samples would be genuine haploid sequences. Then you won't lose any information by random allele picks.

Davidski said...

The PCA with both the ancient composites and reduced modern individuals looks really good. The Kalash sample clusters almost half way between Iran_EN and Yamnaya. I'll post that tomorrow.

IBS and rare alleles results also look pretty good. So far, from the modern samples that I reduced, the Brahui top the IBS/rare allele charts with Iran_EN.

Matt said...

Davidski: Any ideas why?

1. Just different composition of balance between Neolithic vs HG influenced samples alone? The typical PCAs have lots more samples with quite high levels of HG ancestry relative to this one. Would including separate "bins" for the Hungary_HG, and including Samara_Eneolithic change matters?

2. Much more shared drift between the HGs than the Iran_EN and Levant_N. With the formal D-stats, doesn't seem like Iran_EN and Levant_N share that much drift, whereas IRC EHG and WHG have more.

For comparison, using the same composition of samples from the set of outgroup D-stats and then running off a PCA:

No WHG / EHG / SHG in that, so extending clines from these - (or with the whole set of West Eurasians from that stats sheet - The WHG / EHG / SHG would seem like you would expect they would be closer on this than the Levant and Iran Neolithics, but not very dramatic.

Alternative: Running a Linear Discriminant Analysis* on the PC data from Dataset 2 suggests that the strongest PCs in that dataset are combinations of PC3, PC4, PC8: So perhaps more of this is going into those PCs due to high drift / error in some samples hijacking lower PCs. The Levant_N, Caucasus_HG and Iran_EN are also highly differentiated in PC3 and PC4. Armenia and Iran Chl are very differentiated by PC5 and PC6.

* Its a feature in PAST3 where you can place different rows samples into groups and then the mathematical technique identifies which variables separate the groups. I used EHG, WHG, SHG and "Other" as the groupings. As an aside, I also had a go with using this plus clustering grouping to see if it generated different patterns than PCA...

Matt said...

Addendum to last post: Same thing with those D-stats with Correlation mode PCA (which discounts differences in the actual magnitude of differences in the stats and just look at correlations between them) -

The angles between the EHG, ANE and Villabruna stats are closer than between Natufian and Iran_Hotu or Boncuklu and Satsurblia, reflecting that shared drift to the EuroHG-Siberian continuum seems to correlate together more among these samples.

Davidski said...

I can certainly add another Western_HG point, but not sure if Hungary_HG alone can make it on this plot.

Matt said...

OT, interesting article re: early farming in Balkan region -

"The study of dental calculus from Late Mesolithic individuals from the site of Vlasac in the Danube Gorges of the central Balkans has provided direct evidence that Mesolithic foragers of this region consumed domestic cereals already by c. 6600 BC, i.e. almost half a millennium earlier than previously thought."

More "cereal trade before farming", much like the wheat before farming archaeology last year (this is disputed: Or cultivation...?

Davidski said...


Here's a plot with extra hunter-gatherers.

Matt said...

That's quite a big change in the relative compression of those samples together, if that's all that altered.

On a related thing, there seems like a bit of a difference here (in common with the projected PCAs) from the D-stat based PCAs I've been running. On those D-stat PCAs, Iberia_MN_Chl presents an acceptable mixing population with Yamnaya to fit populations like Bell Beaker. While here it seems like you would need a mixing population that's further displaced towards WHG than Iberia_MN_Chl (something like the same distance Hungary_N->Iberia_MNCHl again). The relative distances otherwise look pretty consistent.

(If you do have time, would you be able to run off the following?: This is just to complete the D-stats I have for the Euro HGs as well).

Davidski said...

Yeah, it's a big change. But I need more samples to make things more stable. Adding modern individuals helps a lot. I'll post a new PCA with a whole bunch of extra samples tomorrow, once I have all of the relevant data edited accordingly.

Here are those D-stats...

Chad said...


It does seem there needs to be another pop. Yamnaya, plus extra WHG would make sense west of the Don, with the extra Caucasus stuff in the Kalmykia samples.

Matt said...

@ Chad, maybe that would account for it!

@ Cheers Davidski.

Some PCA plots with those stats:

Covariance plots: (distances to Euro_HG are beyond what I would've guessed)

Correlation plots:

Covariance (just ancient for comparison):

Correlation (just ancient):

Neighbour joining clustering:

Leaving out some columns: (It does look like, when I leave out the columns representing Mesolithic / Epipaleolithic Middle Eastern populations, you could compress all of the Early Neolithic people as mixes of EHG / WHG and a single Middle Eastern point and preserve the rough shape, if not distances).

Ust Ishim vs Dai stat does (noisily) recreate some of the basic features of West Eurasia plot: Structured differences between Ust Ishim vs Dai for WHG / EHG? Therefore difficult to recreate Basal Eurasian from either to a very precise degree.

FrankN said...

@Matt, Dave: "On those D-stat PCAs, Iberia_MN_Chl presents an acceptable mixing population with Yamnaya to fit populations like Bell Beaker. While here it seems like you would need a mixing population that's further displaced towards WHG than Iberia_MN_Chl."
Would it be possible to create a Michelsberg Phantom, say 60% LBK, 35% Iberia_EN, 15% Loschbaur, to see if it fits as such mixing pop (maybe for the additional WHG play around a bit with Motala instead of Loschbaur)? Otherwise, Baalberge should be a reasonably good, though eastwards (LBK)-shifted Michelsberg proxy. Possible to include it into the PCA, or does it lack sufficient markers?

Open Genomes said...

The 3-D version of the Ancient West Eurasia 2 PC Plot:

(Mouse over to see the labels for the period and region, and the data label will be just to the right of the box.)

Unknown said...

@Davidski, Open Genomes and other Bioinformatic Wonks
The 3-D PC plot is striking. Can you take modern populations and, using the first three components of these ancient samples, find their 3-D PC values for the restricted set of markers that Davidski uses to generate the aDNA PC analysis and his found 3-D components? Then, one could generate the convex hull of the tetrahedron of Iran Neolithic, Jordan Levantine PPNB, European HG and Yamnaya/Steppe and place modern samples within these extreme points. A simple admixture program could then estimate each modern sample as a linear combination of these four extreme/source populations.

Samuel Andrews said...

"On those D-stat PCAs, Iberia_MN_Chl presents an acceptable mixing population with Yamnaya to fit populations like Bell Beaker. While here it seems like you would need a mixing population that's further displaced towards WHG than Iberia_MN_Chl"

Neolithic East(of Germany)West(of Russia) DNA is sequenced should tell us who a big percentage of the EEF/WHG ancestors of Corded Ware and Bell Beaker were. Sneak peaks about Neolithic Ukrainian mtDNA were recently posted and the is planning to get genomes. The handful of results they published if anything are more similar to EEF than Yamnaya.

a said...

Nice; thanks Davidski-O.G. - [Reich Lab et al] for taking the time and effort in doing a great job. A few years ago this would not have been possible.Seeing the placement of our ancestors and relatives, shrinks the ancestral populations in relative position within each other. Thanks to the advancements in technology and science, the future looks- absolutely Amazing.
I just had a little extra Nalewka Babuni[naturally, strictly for medicinal,purposes], and tried looking at the plot in 3D . Trying to imagine what it would look like with a set of Oculus Rift.
Can't wait what the future holds in store for the genetic/science community, just amazing!

Nirjhar007 said...

Cool looking plot, thanks.

Aram said...

In ancients PCA Kura-Araxes looks like a mixture of Armenia_Chl and Iran_Chl. Or at last something closely related to Iran_Chl.

CHG+Barcin_N is also a good fit but not realistic to imagine such a admixture.

Aram said...

Who are the most eastern SC Asian populations on Your modern's PCA? At the top of image.

Davidski said...

Brahmins from Uttar Pradesh and Kshatriya, also from Northern India.

Rob said...

@ FrankN

About Michelsberg: is it correct that there's a couple hundred (at least) year gap over much of the territory of the Michelsberg culture and following horizons ?

Curiously, L51 today seems centred on that broad region

Matt said...

I notice the datasheet has eigenvector scaling, where each PC is not scaled according to its % variance. Is it possible to put one without that scaling?

Some correlations of the PCs I got from the Dstats for AG3, Boncuklu, Dai, EHG, Iran_Hotu, Natufian, Satsurblia, Ust Ishim, Villabruna, compared to these: The PC1 and PC2 from both correlate quite well: (slight differences 65.3% and 23.5% respectively from the D-Stats variance). While the equivalent of PC3 and PC4 in the datasheet are swapped in order: (6% and 3.5%). These two are a PC distinguishing Caucasus and Yamnaya from Iran_EN and others, so it is maybe interesting that it seems like the Yamnaya variance is slightly more prominent in the ancient datasheet and the Caucasus factor less (there was a lack of any explicit Yamnaya D-stat in the D-stats sheet).

Davidski said...

Not sure. Haven't tried that yet.

I'm running behind with this thing, but will post a bunch of stuff tomorrow.

batman said...

@ Matt

There's a few examples of grinding fluor and making porridge that's a lot older than "expected". The recording of starch grains attributable to OAT was found in Paglicci 2014.

This and similar indicatiions have greatly expanded our understanding of food-plants used for producing flour in Europe during the Paleolithic; "explaining the origins of a food tradition persisting up to the present in the Mediterranean basin."

Another note is the use of barley to make bear, already 8.500 yrs BP in Scotland:

"Archaeology has given us many glimpses into the prehistoric affinity for alcoholic beverages, which were nothing more than crude mixtures spontaneously fermented by wild yeast and bacteria. In present day Scotland, such a discovery was made on the Isle of Rhum, north of Edinburgh, in 1985. A Pictish Neolithic crock, dated to 6500 BC, was determined to contain the residue of barley and oats, heather and/or heather honey, and assorted indigenous plants."

Since Gotherstrom/Malmstrom 2009 and Malmstrom/Skoglund 2012 there's been a debate among archaeologists critisizing that geneticians have adopted the old terms and (consequent) understanding of a clear-cut and continous separation between "hunter-gatherers" and "farmers".

The transition from foraging to agriculture seems to be more 'gradual' than most 'mainstream' interpretations have claimed, since GW Childes 'revolutionary thesis' from the 1930-ties, claiming the spread of agriculture to be the sole result of an "agricultural package out of Anatolia". Several new studies hav proven this model to be flawed:

Then, again, there is this study, where old grinders are accompanied by analyzes of dental plaquye - questioning if the age of AMH in Europe is more than 50.000 yrs old.

batman said...

@ Davidski, Open Genomes et al.

This model really seems to make sense.

Compared to the archeological and historical records available we even have to deem it 'plausible'.

Great work. Congratulations. And thanks.

Simon_W said...

That 3D PCA by Open Genomes looks amazing. There seem to be 5 poles: WHG, EHG, CHG, Iran_EN and Levant_Neolithic. Though Barcin_N isn't quite on a cline from Levant_Neolithic to WHG, but slightly displaced in one dimension, suggesting that Barcin_N and Levant_Neolithic rather make up a shared pole with an extreme point in a yet unsampled pop. CHG doesn't look like a true extreme pole either, and it doesn't look like a good representative for the pop that mixed with EHG to produce the EMBA steppe pops. It just looks like the best proxy we have so far - Iran_Neolithic is right out. Looking at that extreme position of Afanasievo I get the hunch that a yet unsampled pop from central Asia might be the explanation. CHG + EHG looks particularly off for Afanasievo, at least in the 3D PCA. And yes, it's also visible here that the cline from Yamnaya over Corded Ware and Bell Beaker doesn't lead to Iberia_ChL, but to a unsampled pop with more WHG.

Davidski said...

Here's the datasheet with the present-day samples included.

Rob said...

Of potential interest : Eneolithic aDNA from Lake Baikal

Nirjhar007 said...

Chinese aDNA

Nirjhar007 said...

And Japanese aDNA :

A partial nuclear genome of the Jomons who lived 3000 years ago in Fukushima, Japan

Hideaki Kanzawa-Kiriyama1,2,9, Kirill Kryukov3, Timothy A Jinam1,2, Kazuyoshi Hosomichi1,4,10, Aiko Saso5,6, Gen Suwa5,6, Shintaroh Ueda6, Minoru Yoneda5, Atsushi Tajima7, Ken-ichi Shinoda8, Ituro Inoue1,4 and Naruya Saitou1,2,6

The Jomon period of the Japanese Archipelago, characterized by cord-marked ‘jomon’ potteries, has yielded abundant human skeletal remains. However, the genetic origins of the Jomon people and their relationships with modern populations have not been clarified. We determined a total of 115 million base pair nuclear genome sequences from two Jomon individuals (male and female each) from the Sanganji Shell Mound (dated 3000 years before present) with the Jomon-characteristic mitochondrial DNA haplogroup N9b, and compared these nuclear genome sequences with those of worldwide populations. We found that the Jomon population lineage is best considered to have diverged before diversification of present-day East Eurasian populations, with no evidence of gene flow events between the Jomon and other continental populations. This suggests that the Sanganji Jomon people descended from an early phase of population dispersals in East Asia. We also estimated that the modern mainland Japanese inherited <20% of Jomon peoples’ genomes. Our findings, based on the first analysis of Jomon nuclear genome sequence data, firmly demonstrate that the modern mainland Japanese resulted from genetic admixture of the indigenous Jomon people and later migrants.

Shaikorth said...

Based on the treemix runs (there are several in the supplements with MA-1 and Ust-Ishim included)and D-stats Jomon does look like it has ancient ancestry similar to modern East Asian agriculturalist-derived populations, but is very drifted. They suggest later Ulchi/Nivkh related geneflow into Ainu.

Matt said...

Nice that this Jomon paper has been published. Early version has been online in the form of Kanzawa-Kiriyama's thesis for a while. Hopefully this will mean this data is in the public sphere, at least for big labs, but I don't know with this lab, since they might be quite careful with controlling it.

"We used TreeMix for estimating Jomon ancestry proportions in this study, and the frequency was 12% as shown in Figure 4."

Comparatively, seems like that would be lower than WHG contribution to Middle Neolithic, but then technological gradient would be higher at Yayoi time, plus later people than Jomon may have already absorbed some continental EA influence? Or if closer to 20% much like an analogy to MN_Europeans picking up WHG.

@ Shaikorth - Quite hard I think to tell the drift from treemix though - all the ancient samples tend to form these very long branches.

I do tend to mildly dispose towards that what we will find in East Asia will be a situation like West Eurasia, with small populations who are much tighter in sharing of drift together, as you could measure via Outgroup f3 stats, than modern ones. Like how the Boncuklu and CHG seem to be quite tightly related compared to modern people from those regions. If we had more than one Jomon sample to check stats like Outgroup f3, we might find this.

But unlike a simple reverse of the West Eurasian case, that these populations will probably tend to be similar in their relationship to the West Eurasian outgroups. No analogy there to the West Eurasian components split order BE->WHG+ANE, and then either WHG+ANE drifting with early East Asians or instead later ANE contributing to East Asians (I'm not sure both are still necessary!), and variance in levels of these in later pops.
With this being less true in SE Asia where there may be populations more related to ASI (contributes to South Asians) and NNE Asia where there is ANE influence, and also in a different way less true in Tibet where archaic high adapted population may matter.

ArtemisVentus said...

Y-chromosomal haplogroups were obtained from male individuals in the four cemeteries. Individuals from Lokomotiv and Shamanka II were found to possess haplogroups K, R1a1 and C3, and individuals from Ust’-Ida and Kurma XI were found to belong to haplogroups Q, K and unidentified SNP (L914). For those individuals belonging to haplogroup Q, further experimentation to examine sub-haplogroups of Q revealed that these individuals belong to sub-haplogroup Q1a3. There was significant heterogeneity in the males from the Lokomotiv cemetery when compared to the other three cemeteries. Furthermore, the Y-chromosome results showed a discontinuity between the EN and the LN-EBA populations of Lake Baikal. Combining the maternal and the paternal results from the prehistoric populations of Lake Baikal suggested a patrilocal post-marital residence pattern, where females moved to their husbands’ birthplace after marriage.

Open Genomes said...

Thanks David for highlighting our 3-D Plot in your post.

Here are three PC plots of the Ancient West Eurasian alongside modern samples.
Since modern samples with recent non-West Eurasian admixture are included, it seems like a good idea to create plots for the first four dimensions:

Ancient and Modern West Eurasians Dimensions 1-2-3
Ancient and Modern West Eurasians Dimensions 1-2-4
Ancient and Modern West Eurasians Dimensions 2-3-4

Do these reveal something that the first three dimensions alone don't?

FrankN said...

Niharj - thx, very interestimg!

Rob: "About Michelsberg: is it correct that there's a couple hundred (at least) year gap over much of the territory of the Michelsberg culture and following horizons ?"

Yes and no. Along a good part of the Rhine, there are hardly any archeological findings for the post-Michelsberg period, and some follow-up cultures (Seine-Oise-Marne west of the Rhine, Wartberg in the Upper Weser Basin) seem to only have started around 3400 BC, which may imply a regional hiatus of 1-2 centuries. OTOH, there seems to have been a relatively seamless transformation to subsequent cultures in the Paris Basin, S. Belgium/ W. Rhineland (Stein Group), on the lower Meuse (Hazendonk-Vlaardingen) and in NW Germany/ NE Netherlands (Western Funnelbeakers).

However, while post-Michelsberg settlement finds get extremely sparse along the Upper and Middle Rhine (to be understood in a wider sense, the phenomenon reaches from the Saar to the Neckar), already Michelsberg settlement finds weren't that common. This may relate to either of (i) possible settlement concentration in the enclosures, (ii) a substantial pastoral element and "light" housing types that leave little archeological traces, and (iii) substantial soil erosion in low mountain areas that has e.g. significantly affected Michelsberg sites around Heilbronn.
Pollen diagrams show a continuously high degree of landscape opening after the end of Michelsberg. So while we can certainly say that Michelsberg's "urban componement" (enclosures, long-distance trade) ceases to exist along the upper/middle Rhine, this is much less certain for the rural aspects (small-scale agriculture, cattle pastoralism). In that context it is noteworthy that contemporary to the end of Michelsberg, part of the Late Michelsberg package (enclosures, gallery graves) makes it entry into Britain, and into the Funnelbeaker culture (e.g. the large Halle-Dölauer Heide enclosure with Bernburg Culture Gallery graves).

Why the lands around the Rhine "de-urbanised" is still under discussion. One factor seems to be climate change: The end of Michelsberg coincides with the end of the Holocene Climate Optimum. While still warmer than today, the break seems to have been deep and aprupt, in relative terms more severe than the "Little Ice Age". The Western Alps seem to have been hit especially hard by a colder yet more humid climate - during the first half of the 4th mBC, glaciation expanded rapidly, and lake water levels (e.g. Lake of Constance) rose by several meters. Pile-dwelling cultures were strongly affected; there is e.g. a 3600-3200 BC settlement hiatus in the Federsee basin (N. of Lake Constance), Lake Constance was depopulated around 3450.
Advancing glaciation cut off access to Mt. Viso (Piemont) Jadeite, which had been a major Chasseo-Michelsberg trade commodity, distributed as far as Brittany and Denmark, and most likely affected passage over the Western Alps in general. Hence, trade along the Rhine should have shrunk for less customers in the Westerne Alpine forelands, and more difficult/ lost access to Alpine commodities.
Moreover, the Michelsberg economy was strongly based on salt-soothing (Heilbronn, Bad Nauheim, Glauberg, Soest etc.), which seems to have caused substantial deforestation. The period's more humid climate would have enhanced soil erosion and seasonal flooding, and have affected passage along Rhine, Mosel, Neckar etc.
Interestingly, it is the Michelsberg enclosures related to stone extraction that seem to have lasted longest, e.g. Mayen (basalt querns) until 3500 BC, or the famous Spiennes/BE flint mines that were contiously used over the Michelsberg- SOM (or Stein Group?) transition. Possibly, the eastwards expansion of Late Michelsberg was already driven by wood shortage around traditionol salt soothing sites, and attraction to salt-rich Westphalia / Lower Saxony.

Matt said...
Genes mirror migrations and cultures in prehistoric Europe - a population genomic perspective


A review paper I think so maybe not anything new for us, datawise. May be some interesting new interpretations and questions...

Davidski said...


Do these reveal something that the first three dimensions alone don't?

I'm guessing they should, although I'm not sure what that might be yet.

Open Genomes said...

If anyone doesn't see some important differences between the plots in 4 dimensions, have a look here at these screenshots, and look closely at the position of CHG relative to the Iranian Neolithic and EHG:

PC 1-2-3
PC 1-2-4
PC 2-3-4

Notice that in dimensions 1-2-3 CHG clusters with Yamnaya and EHG.
In dimensions 2-3-4, CHG clusters with EHG.
However, in dimensions 1-2-4, CHG clusters with Iran Neolithic, and is quite distant from EHG and Yamnaya.

It seems to me that the varying position of CHG relative to EHG (and its Steppe successor Yamnaya) and Iran Neolithic indicates that CHG is a mix between a "proto-Iranian-Neolithic" population, and EHG. This is exactly what we've seen in some K=13 analysis of these ancient and modern samples.

Among the Caucasus Hunter-Gatherers, Satsurblia of course predates farming and is roughly contemporary with the Natufians of the Levant. Kotias however, is just about 500 years earlier than the Iran Early Neolithic Ganj Dareh, Tepe Abdul Hosein, and Wezmeh Cave samples, and contemporary with the Boncuklu farmers from Central Anatolia near (later) Çatalhöyük, and also about the same time as the Levantine PPNB.

The CHG admixture with EHG is found in both Satsurblia and Kotias, and so it obviously predates the Neolithic. Satsurblia looks rather homozygous, like a classic hunter-gatherer. However, Kotias is quite heterozygous, and looks something like a mix of Satsurblia and "another population".

The same is true for the Natufians, who appear to be a mix of local Near Eastern hunter-gatherers and North African hunter-gatherers. (They are in Y haplogroup E1b1b1b2a-M123, which originated in Africa around 19,200 BP, during the LGM.) The Levantine PPNB I0867 however has "extra" input from Near Eastern population, and this exact same input seems to be found in the Iran Late Neolithic Seh Gabi SG2 sample.

Open Genomes said...

Also realize that Iran Late Neolithic Ganj Dareh GD13A/I1290 is mtDNA X2, and Northwest Anatolian Neolithic Barcin Bar31 is X2m and in Y-DNA G2a2b-L30*, Bar99/I1098 is mtDNA X2d2, and Menteşe I0723 is X2m2.
Early Central Anatolian Neolithic Boncuklu Bon002 who is contemporary with CHG Kotias is mtDNA K1a and Y-DNA G2a2b2b1a1-PF3378, Barcin I1583 is mtDNA K1a2 and Y-DNA G2a2a1a2-L91, and other samples from Barcin and Menteşe are in mtDNA K1a4, K1a1, K1a2, K1a6, K1a3a.

Interestingly, Levantine PPNB I0867 from Motza Israel (just west of Jerusalem) is Y-DNA H2-M282* and mtDNA K1a4b.

One of the more striking correspondences is between Iran Late Neolithic Seh Gabi SG2 who is Y-DNA mtDNA K1a12a and Y-DNA G2a1a-Z6553, and Central/Southeast Anatolian Neolithic Tepecik-Çiftlik Tep002 who is also mtDNA K1a12a. G2a2a-PF3146* was found in Tepecik-Çiftlik Tep001 who was also mtDNA K1a.

Iran Early Neolithic Wezmeh Cave WC1, a few hundred years after Kotias and Boncuklu was mtDNA J1d6 and Y-DNA G2b2a-Z8022. WC1 was definitely a farmer who ate a diet that consisted mostly of grain.
Iran Early Neolithic Ganj Dareh I1945/GD16, a few hundred years after WC1, was mtDNA J1c10 and Y-DNA R2a-M479. Barcin I0744 was mtDNA J1c11 and Y-DNA G2a2b1a-P303*.

Both Tell Halula and Tell Ramad have mtDNA K. (Only the mtDNA HVR1 region was sequenced so it's difficult to determine the subclade.)

Iranian Early Neolithic Tepe Abdul Hosein AH4, also roughly contemporary with Kotias, was mtDNA T2c. Barcin I1099 was mtDNA T2b and Y-DNA G2a2a1a2-L91, and Barcin I1101 was also mtDNA T2b.

Isn't there an obvious pattern here?

mtDNA K1a, X2, J1, and T2 along with Y-DNA G2-P287 are strongly associated with farmers in Iran, Anatolia (and even the PPNB Levant if we include Y-DNA H2-M282), and Neolithic Europe.

On the other hand, the Caucasus Hunter-Gatherer Kotias was mtDNA H13c, Yamnaya I0370 was H13a1a1a and R1b1a2a2-Z2103, Bell Beaker from Quedlinburg I0112 QUEXII-6/QUEXII-3 was mtDNA H13a1a2c, and Poltavka I0374 was H13a1a and R1b1a2a-L23.

Of course Mesolithic European Villabruna (roughly contemporary with Satsurblia) was R1b1a-P297. (Yes, not R-L389! He is derived for two R-P297 equivalents.)

mtDNA H13 at least seems to be associated with the CHG and Bronze Age Steppe peoples.

So some of the variation we see at higher dimensions, as well as minor principal components (particularly K36 "Near Eastern", "East Med" and "Arabian") can be explained by Mesolithic and Early Neolithic "ghost" populations that haven't been sampled yet. There are at least two such Near Eastern populations (leaving aside the North African input to the Natufians):

• An Early Neolitic (PPNA?) "Basal Eurasian" population, mtDNA K1a, X2, J1, and Y-DNA G2

• A CHG-EHG (and possibly WHG/Villabruna)-related population, mtDNA H13 (at least) and R1b1a-P297.

There seems to be yet another ancestral population:

• An ANI-related population that may be mtDNA R2 (yes, mtDNA R2), and Y-DNA R2 (Tepe Abdul Hosein) and J2a (like the Iranian Mesolithic Hotu Cave Hotu IIIb.)

What we now have to do is find some actual tag SNPs and IBD segments that track these hypothetical ancestral populations.

Doesn't this make sense now?

Right there, with Kotias' mtDNA H13, we can see Kotias' relationship with the Steppe people which we see on the higher-dimension PC plots.

The things you can see by examining the higher-order dimensions in PC plots ...

Open Genomes said...

To help find the novel tag SNPs for this so-called "Basal Eurasian" Early Near Eastern Neolithic (PPNA?) population, I did a reanalysis of the two highest quality ancient Near Eastern whole genome sequences, the two that were used by Broushaki et al. (2016) for their Eurasian IBD analysis:

• WC1 from Wezmeh Cave Iran WC1 Wezmeh Cave, Iran 7455-7082 calBCE (9465-9092 BP) Y-DNA G2b2a-Z8022 mtDNA J1d6
115105 biallelic SNPs in 23andMe format with read depths
Corresponding Plink files for WC1
Gedmatch: M392829

• Bar8 from Barcin, Northewest Anatolia 6212-6030 BCE (8222-8040 ybp) mtDNA K1a2
1146103 biallelic SNPs in 23andMe format with read depths
Corresponding Plink files for Bar8
Gedmatch: M711494

All SNPs are in positive orientation, and are identified by rsIDs except for approximately 13,000 Axiom Human Origins SNPs that only have "AX-" IDs.
The read depths will enable anyone to filter the SNPs to help eliminate random noise from DNA damage. (To be considered heterozygous, the minor allele must be at least 25% of the reads. A read depth of 5 require a minor allele to have more than one corresponding read.

I also created files for somewhat lower coverage whole genomes.
I merged the Ganj Dareh GD13A 1.2x whole genome with the I1290 sample 1240k SNP array data from the same individual:

• GD13A-I1290 Early Neolithic Ganj Dareh Iran 8179-7613 calBCE (10189-9623 BP) mtDNA X2
1020355 biallelic SNPs in 23andMe format with read depths
Corresponding Plink files for GD13A-I1290
Gedmatch: M249214

• Bar31 from the NW Anatolian Neolithic, Barcin, Turkey, 6419-6238 calBCE (8429-8248 BP) Y-DNA G2a2b-L30* mtDNA X2m
1098033 biallelic SNPs in 23andMe format with read depths
Corresponding Plink files for Bar31
Gedmatch: M063398

• Rev5 from the Greek Early Neolithic Revenia 6438–6264 BCE (8448-8274 BP) mtDNA X2b
804577 biallelic SNPs in 23andMe format with read depths
Corresponding Plink files for Rev5

Davidski said...

I'm not really sure how to go about recreating Basal Eurasian. My attempts with ADMIXTURE suggest that it's so intertwined with other non-Basal stuff from the Near East that this might well be an impossible task.

The Basal-rich K7 cluster is about the best I can do, and that's only around half Basal. So I'd say that identifying Basal Eurasian IBD tracts would be exceedingly difficult.

But yeah, if anyone else wants to have a crack, then by all means. It'd be a big project though.

Grey said...

"I get the hunch that a yet unsampled pop from central Asia might be the explanation"

Just a guess,76.6231343,236114m/data=!3m1!1e3!4m5!3m4!1s0x38a91007ecfca947:0x5f7b842fe4b30e1b!8m2!3d48.019573!4d66.923684

Open Genomes said...

David, what I may be able to do is extract all the SNPs from the regions of common coverage in BAM files of WC1 and Bar8. In that case, a base would either be a "variant" compared to the hg19 Reference Sequence or identical to hg19.

To try to find "Basal" segments we would need appropriate high-coverage outgroup samples. Would Ust'-Ishim and Mota work for that? Or would Ust'-Ishim need to be replaced by La Brana 1 and MA1? Then, whatever WC1 and Bar8 had have in common that is missing in the others, would be "Basal Eurasian".

This of course could be somewhat verified using modern whole genome sequences, but the problem there is that today a pure ANE or WHG / Villabruna population no longer exists, and Near Eastern Neolithic ancestry is very widespread, in Europe, South Asia, and even North China and parts of Africa. I think we could probably assume that neither of these have any real East Asian ancestry, and any supposed "Oceanian" would in fact be Archaic (Neanderthal-derived) segments.

Once the IBD segments are identified, it shouldn't be too hard to find them among modern whole genome sequences, and even identify a new set of tag SNPs as proxies for them.

The problem with the Affymetrix Axiom Human Origins Array chip, is that while the SNPs were ascertained using whole genomes from diverse populations such as the Mbuti and Biaka, Sardinians, Karitiana, and Australians (but unfortunately, no South Asians were included!) the AIMs they selected were the ones that differ between these modern populations.
A SNP array for human population genetics studies (pdf, 650 KB)

So-called "Basal Eurasian" is a "ghost population" that like many ancient groups, no longer exists today, but in particular this ancestry, like ANE, became very widespread among many populations. However, unlike ANE, we neither have a set of ancient genomes that represent this population, and more importantly, we don't have many modern whole genomes from the Near East.

There were two recent studies that did Near Eastern whole genome sequences:

Do you know about this?
Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry

And I think you know about this study:
Genome at Juncture of Early Human Migration: A Systematic Analysis of Two Whole Genomes and Thirteen Exomes from Kuwaiti Population Subgroup of Inferred Saudi Arabian Tribe Ancestry

And maybe this one:
Whole genome sequencing of Turkish genomes reveals functional private alleles and impact of genetic interactions with Europe, Asia and Africa

"Here, we present whole genome sequences of 16 Turkish individuals resequenced at high coverage (32 × -48×)."

Open Genomes said...

However, here's a resource which I don't think you know about, published on Aug. 22nd, which could enable you to move beyond pseudo-haploid "calculators" and AdMix tools graphs and into the realm of real IBD analysis, when this data is applied to ancient samples:

A reference panel of 64,976 haplotypes for genotype imputation.

We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently."

So I'll try to see what I can do with WC1 and Bar8, and if you have advice about which other high coverage ancient samples to use (i.e. Mota, La Brana 1, MA1, and possibly Ust'-Ishim) I'll see if I can add those, too.

Perhaps we can for the first time actually identify something that is truly "Basal Eurasian" and thereby also sort out IBD segment that are WHG ("Villabruna"), ANE, and maybe common to both, but missing in "Basal Eurasian"?

batman said...

@ OG

Impressive summary, at the cutting edge of this line of research.

The identification of the first, paleolithic Natufians as an early carrier of y-dna E and mt-dna X is indeed triggering. Especially since the early later Natufian appear to belong to the CT-group.

Here's some old comments on this specific subject.

Given the location of the Berbers - north of the Atlas mountains - one may ponder wether y-dna E and/or mt-dna X could be a result of an admixture between cro-magnons and neanderthals, alternatively koi-san and neanders...

Nirjhar007 said...


Check out this new paper :

European Neolithic societies showed early warning signals of population collapse
Sean S. Downeya,1, W. Randall Haas, Jr.a, and Stephen J. Shennanb
Author Affiliations
Ecosystems on the verge of major reorganization—regime shift—may exhibit declining resilience, which can be detected using a collection of generic statistical tests known as early warning signals (EWSs). This study explores whether EWSs anticipated human population collapse during the European Neolithic. It analyzes recent reconstructions of European Neolithic (8–4 kya) population trends that reveal regime shifts from a period of rapid growth following the introduction of agriculture to a period of instability and collapse. We find statistical support for EWSs in advance of population collapse. Seven of nine regional datasets exhibit increasing autocorrelation and variance leading up to collapse, suggesting that these societies began to recover from perturbation more slowly as resilience declined. We derive EWS statistics from a prehistoric population proxy based on summed archaeological radiocarbon date probability densities. We use simulation to validate our methods and show that sampling biases, atmospheric effects, radiocarbon calibration error, and taphonomic processes are unlikely to explain the observed EWS patterns. The implications of these results for understanding the dynamics of Neolithic ecosystems are discussed, and we present a general framework for analyzing societal regime shifts using EWS at large spatial and temporal scales. We suggest that our findings are consistent with an adaptive cycling model that highlights both the vulnerability and resilience of early European populations. We close by discussing the implications of the detection of EWS in human systems for archaeology and sustainability science.

Rob said...


Thanks I ll take a look

@ Others
About Basal: might be just worth waiting for the same teams which brought Iran & Natufian samples to hopefully sample some > 20 kya bones from the Middle East. IIRC a few do exist.

klevius said...

As a side note to some comments above about the concept 'hunters-gatherers', I long time ago (Demand for Resources - on the right to be poor, 1992, ISBN 9173288411) tried to "re-classify" it departing from C. Levi-Strauss idea on "warm" and "cold" societies.

So called civilized societies can be described as dynamic, hence contrasting against the more static appearance of the economic setting (lack of investment) of e.g. hunter-gatherers:

A. Without 'extended demands for resources' (EDFR . you want what you need but you don't necessarily need what you want).
B Affected by EDFR but still retaining a simplistic, "primitive" way of life.
C. Civilized with EDFR

These categories are, of course, only conceptual. Applied to a conventional classification the following pattern appears:

1 The primitive stage when all were hunter/gatherers (A, according to EDFR classification).
2 Nomads (A, B, C).
3 Agrarians (B, C).
4 Civilized (C).

As a consequence EDFR is here used as a concept tied to civilization (and its preliminary stage). The above also suggests a critique against our conventional conception of a simplistic connection between intelligence and performance as exemplified by C. Popper's scenario of a World 1-3 transition of human cultural development.

To exemplify this I ended with a chapter called San, Khoe, and Bantu.

Point being a way of distinguishing between appearance and motivation. And of course, today we are all doomed to eternal civilization dynamics, not the least via technology, but this emerged out of "pre-investment times - no matter how diffusely.

Nirjhar007 said...


what's your view on the R1a found in Neolithic (5000 BC) Altai.Sounds like R1a came from central Asia. Are you ready to accept your Asian roots ?? .

Davidski said...

Why do you think the R1a in the Altai came from Central Asia and not Eastern Europe?

And have those PCR results in the thesis been confirmed in a formal publication using more modern methods?

How do you know it's not contamination?

FrankN said...

@Nirjhar: Thx for the "Population collapse" paper. Very interesting, even though I wished they would have taken a different geographic panel. What they are showing is an almost simultaneous population collapse in S. Germany, E. Switzerland, England&Wales, Ireland, and Scotland around 5.7 kya +/- 1-2 centuries. Unfortunately, all those regions are not indpendent, but underlying the same pattern, namely neolithisation around 6 kya, and subsequent collapse due to a combination of (i) unsustainable ressource management (slash & burn) and (ii) climate change (end of Holocene Climate Optimum). Their case would have been more convincing if it had included, e.g., the LBK collapse in the Rhineland 7.8 kya, or the FB collapse in N. Europe some 5 kya.
Nevertheless, I like
(a) how population declines in the Paris Basin (Fig. S1a/b) correspond to (i) the initial spread of Michelsberg around 4,500 BC, (ii) the neolithisation wave around 4,000 BC, and (c) the general, climate-induced Central European decline 5.7 kya (unfortunately not anymore analysed for the Paris Basin);
(b) the way how Wessex/Sussex (Fig. 2) appear to have moved in sync whith what we know about N. German and S. Scandinavian demographics, namely a population boost some 5.6 kya (when Central Europe depopulated), and collapse some 2-3 centuries later. In the case of Holstein, the Danish Isles and Scania, that collapse corresponds to colonisation of Mecklenburg, E. Central Sweden, SW Finland and possibly the SE Baltics, plus strong Northern FB influence in the Elbe-Saale Region (Bernburg Culture); I wonder where those Wessex/ Sussex settlers left for...

@Peter Clavius: I think you are overlooking the specific case of maritime foragers. Their subsistence base would be fish, sea mammals, and waterfowl, plus plants preferring a maritime climate. For NE Europe, the latter included hazelnut (fats & carbonhydrates), water lilys (carbonhxdrates), and, as sources of Vitamin C, cabbage, crab apple, sloe etc.
Ertebolle/ Swifterband, but also the coastal Levante "PPN" that wasn't "neolithic" at all, provide evidence that maritime foraging can sustain substantial population densities. There wasn't just a delay of one and a half millennia before agriculture took hold on the coasts of Brittany, the North and Baltic Seas, the same delay can also be observed for the Antakya basin in relation to SE Anatolia.

When looking at Ertebolle boats, sophisticated fish traps, and seal oil extraction as lamp fuel, there definitely wasn't lack of innovation or investment. In addition to EDFR, ecologic sustainability of subsistence models needs to be considered. And, as has become obvious now, Near-Eastern type agriculture wasn't sustainable in NC Europe, at least not before the "secondary products revolution" (dairying, manure application etc.).

FrankN said...

As addendum to my last post: The Jomon culture is another case of maritime foragers that clearly can't be blamed for a lack of innovation/ investment. Indigenous Californians, with a fish & hazelnut subsistence similar to Ertebolle, might provide a worthwhile case study as well. Furthermore, a look at the Melanesian/ SEA Neolithic, with its banana/ taro/ yam/ coconut/ chicken "package" added onto fishery, could be instructive.

Matt said...

Few attempts to fit the final PCA over the projected PCA, to see what the similarities and differences are: / trying to fit the HG cline together as much as poss; modern populations don't overlap between the different PCA if so - matching so moderns overlap position as close as poss - trying to match positions of Levant_N and Iran_N

More decompression of distance of Euro HG samples towards the NW of the graph, away from all other samples.

Samuel Andrews said...


Y DNA R1a was expected to be found in pre-historic Siberia because we already knew from aDNA they had some mtDNA U5a, U4, U2e(typical of EHG). EHG has connections to Siberia via huge amount of MA1 ancestry, so it isn't hard to believe R1a existed in pre-historic Siberia.

klevius said...

@Blogger FrankN: 'Maritime foragers' may belong to any of the three classifications of EDFR (expanded demands for resources). The very point of my "re-classification" was to avoid particular food resources for the purpose of being able to have a more analytical tool for assessing motivation/level of investment. In the book text I actually mention 'fishing' as an example of subsistence contribution that could involve all categories. In my "re-evaluation" of conventional categorization of societies (based on type of food/food gathering) I rested (except for personal reports by people who knew "Bushmen") quite heavily on Richard Lee's works on !Kung from the early 1960s as well as on Patricia Drapher's reports about the role of women's subsistence contribution (in the Harward !Kung Bushmen Study Project), which suggested that the !Kung society was the 'least sexist of any (society) we have experienced'. And as we all know, women's (and children's) role in the history of societies has been a rocky ride to say the least, and should always be taken into account when assessing past societies/cultures - as a known or unknown variable.

However, you're surely right in emphasizing the importance of specific food supplies available in specific cultural traditions (shaped by or shaping them) at specific times.