search this blog

Loading...

Friday, May 9, 2014

More info on two Thracian genomes from Iron Age Bulgaria + a complaint


PLoS Genetics has just published a new paper on the genetic affinities of Oetzi the Iceman (see here). As far as I can tell, it simply affirms what we've already learned about Oetzi from previous studies, but it does feature interesting new insights into a couple of genomes from Iron Age Bulgaria, aka. Thrace:

The first individual (P192-1) was excavated from a pit sanctuary near Svilengrad, Bulgaria, dated to 800–500 BCE. The other individual (K8) was found in the Yakimova Mogila Tumulus in southeastern Bulgaria, dated to 450–400 BCE.

...

For the Thracian individuals from Bulgaria, no clear pattern emerges. While P192-1 still shows the highest proportion of Sardinian ancestry, K8 more resembles the HG individuals, with a high fraction of Russian ancestry.

...

Interestingly, this individual [K8] was excavated from an aristocratic inhumation burial containing rich grave goods, indicating a high social standing, as opposed to the other individual, who was found in a pit [15]. However, the DNA damage pattern of this individual does not appear to be typical of ancient samples (Table S4 in [15]), indicating a potentially higher level of modern DNA contamination.


K8 might well be contaminated with modern DNA to some degree, but I'd say there's a much better explanation for these signals of non-trivial genetic substructures within the Thracian population.

Archeology suggests that during the Bronze Age the Balkans were invaded from the east by nomads associated with the Yamnaya culture of the Pontic-Caspian Steppe. These invaders, possibly of early Indo-European stock, liked to build Tumuli mounds for their important dead, which were essentially copies of the Kurgan mounds built by the Yamnaya and related peoples.

Moreover, we now know that indigenous European hunter-gatherer (HG) ancestry survived best in Eastern Europe (see here), so it's very likely that the aforementioned invaders from the steppe were significantly HG-like in terms of genetic structure.

Therefore, the fact that K8 was buried in a richly furnished Tumulus (essentially a Kurgan), and genetically more similar to indigenous Europeans than P192-1, who was genetically more Near Eastern-like, and basically thrown into a ditch after he died, doesn't appear to be a coincidence.

In other words, perhaps K8 belonged to a ruling class of steppe origin, while P192-1 was largely of native Balkan stock, whose ancestors were conquered centuries earlier by the steppe nomads and forced to live as an underclass? If so, it wouldn't be the only time in history that this sort of thing has happened, especially within Indo-European societies.

By the way, unfortunately I have to add that the Principal Component Analyses (PCA) in this paper featuring the two HG genomes, ajv70 and La Brana-1, are simply woeful (PDF link). These genomes should be clearly outside the range of modern European genetic variation, but here they land among the Orcadian and French samples. Where was the peer review I wonder?

Citation...

Sikora M, Carpenter ML, Moreno-Estrada A, Henn BM, Underhill PA, et al. (2014) Population Genomic Analysis of Ancient and Modern Genomes Yields New Insights into the Genetic Ancestry of the Tyrolean Iceman and the Genetic Structure of Europe. PLoS Genet 10(5): e1004353. doi:10.1371/journal.pgen.1004353

See also...

Ancient DNA from prehistoric Bulgaria and Denmark

PCA projection bias in ancient DNA studies

65 comments:

Maju said...

"I have to add that the Principal Component Analyses (PCA) in this paper featuing the two HG genomes, ajv70 and La Brana-1, are simply woeful"

Why? Just because you're already used to certain type of results (which are surely product of certain sampling strategy)?

I haven't got time to read the small type yet but I'm indeed puzzled by those so very different 6 PCAs, which should be identical, right? (The ancient samples are "projected", hence they should not alter the graph).

There are indeed many questions here but why should we dismiss this study's results, just because in this case Bra1 falls onto the Basque cluster instead of "Svalbard"? When you look at the ADMIXTURE results (figs. 1 & S1) it seems apparent that Bra1 actually cluster best with Basques too, for example (~60% affinity, most of the rest being "Russian"), even better than Gök4. And that's novel and interesting in itself.

Similarly, again in the ADMIXTURE graph, Ajv clusters with Orcadians, Gok4 with Italians and only Ötzi aligns best with Sardinians. All that is consistent with the PCAs.

Beyond this debate, fig. 3 is also interesting: it confirms that there is African-like admixture at the root of the SW European cluster (via EEF surely), that Tuscans (even if in this branch) are mixed with "other Europeans" and that Finns have East Asian admixture. These we can confirm in the f3 formal tests available in the supplemental materials.

So for me it's just another piece of info, a different point of view, based on a different sampling strategy, which complements the previous ones. Contrasting these different POVs is maybe confusing initially but should also provide a better understanding of the whole picture - because there is no single nor simple interpretation of autosomal genetic data.

Maju said...

PS- BTW: notice that the Mozabite cluster does not show up in any European sample, suggesting that the element of African-like admixture in Bra1 or the EEF line is not linked to this modern but isolated Algerian community. In the case of EEF we can think of distinct Nile area origins but what about Bra1? My thought is that it actually reflects ancient Aterian elements, which are nowadays also diluted to near-nothingness in NW Africa (excepted Southern Morocco, where it's still apparent).

Davidski said...

Maju,

The PCA are just wrong. The ADMIXTURE analysis is also shit, but the PCA takes the cake as the worst I've ever seen of ancient DNA from Europe.

Davidski said...

By the way, what sampling strategy? The HGDP can actually be described as a sampling strategy in 2014?

Shaikorth said...

Yeah the PCA is just junk. Clear combination of small dataset and projection bias.

La Braña isn't a modern Basque, and in a properly done PCA he's outside European variation. In Chromopainter/FineSTRUCTURE he clusters with Finns in unlinked mode with obvious heavy affinity to Lithuanians and Scando/Finnish mixes too, but in linked mode he does cluster with Basques which is supposedly a genealogical link. Latter could be verifiable with IBD.

http://fennoscandia.blogspot.no/2014/03/la-brana-1-closest-to-basque-sardinians.html

The Basque component in the admixture run basically appears to be formed by a mix of Sardinian and Northeast European (Russian).

What I find interesting is that in the treemix runs without admixture edges, La Braña has heavy implications of Sub-Saharan influence in the residuals. Lazaridis picked up Basal Eurasian in it, but couldn't verify it. Some admixture calculators show there's African in it (Dodecad globe4/13) and some don't (Eurogenes K13/K15). I wonder what's going on there.

Maju said...

Honestly, just dubbing some results "junk" without any argument, doesn't seem serious at all.

The sampling strategy is a prior choice of the researchers. In this case they have overloaded SW Europe, while in the case of Skoglund & Malström, they overloaded the Scandinavian sample, etc. These choices obviously affect the results and I understand that it is interesting in itself looking at the various results produced by various sampling strategies.

...

La Braña isn't a modern Basque, and in a properly done PCA he's outside European variation. In Chromopainter/FineSTRUCTURE he clusters with Finns in unlinked mode with obvious heavy affinity to Lithuanians and Scando/Finnish mixes too, but in linked mode he does cluster with Basques which is supposedly a genealogical link. Latter could be verifiable with IBD.

Ditto. The same that Lochsbour looks most directly related to modern French, it's probable that La Braña are most directly related to modern Iberians (Basques included), even if these have less WHG-like overall ancestry. That's almost certainly because of two reasons:

1. There was a clear diversity among ancient European HGs, which is being somewhat diluted in certain analysis but shows up when IBD is considered.

2. Much of what we perceive as "WHG" in modern Europeans is in fact derived from other populations with higher ANE ancestry/affinity from Eastern Europe (and probably also Scandinavia/North Sea region in some cases). In other words: there's no simple WHG element but a diverse array of them (at least four: Braña, Lochsbour, Scandinavian, Eastern European - plus the Lazaridis "UHG" that comes with EEF and may be from the Balcans, plus other unknown elements in Italy, etc.)

Davidski said...

You were given two arguments...

1) HGDP samples

2) PCA projection bias

I'll give you a third one...

3) Slack peer review

Chad Rohlfsen said...

I don't think that we have any evidence to support the idea that there were many HG types. The previous paper we discussed had HG's more homogenous than Han Chinese, while covering an area 4-5x as large as the Han homelands. That is a pretty remarkable.

The only HG's that deviate a little are mixed with ANE. Including Loschbour into future studies could paint a clearer picture. I still think that it is too early to say that the slight space between HG's is due to much more than drift.

I will not rule out a possible tiny incursion of something African into Iberia 7kya, but it is still a little early to jump on that bandwagon.

Chad Rohlfsen said...

Maybe yDNA C has something to do with the African affinities due to it's early break off during OOA, similar to the situation with the Basal component. That could be an explanation that doesn't require any cross-over into Iberia.

Chad Rohlfsen said...

Sorry for the third post here. I think that more studies should be done on Loschbour. His skull is nothing like LaBrana, or the others. His vault and brow ridges are very archaic looking. I am curious about the amount of archaic admixture in this individual.

V Robazza said...

P192 is noted as U3b in mtdna and M in ydna

k8 has no mtdna reading but has F in ydna

Maju said...

@Chad:

"The previous paper we discussed had HG's more homogenous than Han Chinese, while covering an area 4-5x as large as the Han homelands".

Actually that low diversity only applies to the Gotland PWC samples (Ajv, etc.) Gotland has barely 3,200 km², while Beijing alone has 16,400 km². It is absolutely unsurprising that such a remote and isolated population had so low diversity.

"I don't think that we have any evidence to support the idea that there were many HG types".

There is. In the Lazaridis study, SI-19, it is apparent that, by IBD segment length, the French are much more closely related to Lochsbour than any other population (excepted Zimbabwean whites for some odd reason), even if they don't score particularly high in the overall fragment heritage (more admixed but also more directly related to Lochsbour, not via some other Euro-HG population, in their WHG side).

Similarly here it is confirmed (and also in the Paisen analysis mentioned by Saikorth above) that La Braña has a more direct relation (longer IBD segments) with Iberians and Basques, even if in the overall "raw" ancestry it is less notable than with other populations, which again owe, no doubt, their pseudo-WHG ancestry to other populations such as Scandinavian HGs or Eastern European HGs, not WHGs as such.

The fact that there is a direct IBD connection Lochsbour-French and Braña-Iberians, but not Lochsbour-Iberians, Braña-French nor either with, say, the Danish or the Finns, clearly indicates localized specificities that have affected the modern mix. In other words: that EEFs mixed with Iberian HGs in Iberia, with Mid-Western HGs in France, etc. producing the baseline of modern local populations (later enriched with Eastern European IE blood).

"I will not rule out a possible tiny incursion of something African into Iberia 7kya"...

IMO before 7 Ka, surely in the Solutrean-Oranian interaction. The more I see about Braña 1 African (but not Mozabite) affinities, the more clear it is to me that it must refer to the Aterian residue (which must be perceived as "African" from an Eurasian perspective but it's actually equi-distant). NW Africans have a very small (~1%) highly divergent (huge Fst values to all other components) local element, more apparent in Southern Morocco (14%), which IMO is Aterian by origin (i.e. from the same age as the OoA).

Some of that very old Aterian element must have made it to Iberia as Oranian genesis reflux, affecting mostly to the Western half (what is totally consistent with the Solutrean patterns of diffusion in that area: Valencia → Andalusia → Portugal → Asturias).

Matt said...

It's interesting to me that the European ancestry in Mexicans is a better fit to the Iceman than the European references appear to be (TSI, CEU and “... we do not find a difference in relatedness between modern Iberians and either the TSI or CEU in the 1000G/Sardinia dataset..”), in terms of derived allele sharing.

It reminds me of the Population history of the Caribbean (Moreno-Estrada et al. 2013) paper ( http://dienekes.blogspot.co.uk/2013/06/population-history-of-caribbean-moreno.html) where when European origin Latino segments were plotted, they come out as “ultra Iberian”.

Although I still would think this is really an artefact of methods to separate out European ancestry in admixed populations (zombification), rather than any population structure or historical change. (The authors say “(one explnation is) the excess sharing results from comparing only double European segments in the Mexican individuals to those of the whole genome in CEU and TSI. Since our local ancestry inference masks segments that are not clearly of European ancestry, any non-European ancestry in the CEU or TSI would lead to excess sharing of the European tracts in the MXL and the Iceman.”).

Also interesting that the CEU and TSI samples appear roughly similar in terms of derived allele sharing with the Iceman.

Maju – I think each PC was produced using the modern samples only for those SNPs that overlap with the ancient sample, and then the ancient sample was projected on.

“We also performed principal component analysis (PCA), separately for each ancient genome together with the 263 contemporary individuals, using only non-missing SNPs of the respective ancient sample. The initial PCA was performed using the modern samples only, followed by projection of the ancient samples onto the inferred principal components.”

I am not sure if the other projected PCA in other papers have used this principle, or used the principle of using all overlap in the modern samples, then cutting down again to the overlap with the ancient samples.

Davidski said...

V Robazza,

Try to stay off the drugs when posting here. Thanks in advance.

Grey said...

"where when European origin Latino segments were plotted, they come out as “ultra Iberian”."

One of my thoughts - based on very little - is the thought that the first stage in the sequence wouldn't be Out of Africa but Out of the Tropics.

So to me there'd be a population A in the tropics who moved out, became population B and spread out.

Then one segment somewhere develops into population C and expands in turn mostly replacing B throughout Eurasia including a back-migration into Africa (but not the tropical part).

Combined with the later Bantu expansion from population A this makes me wonder if this imagined population B mostly disappeared except autosomally.

Anyway I wonder about that in connection with NW Africa, Iberia, North Wales, Guanches, that Moreno thing etc.

Chad Rohlfsen said...

@Maju
Look at the plots in that study. LaBrana was plotted by the Swedish samples, not near modern Iberians..

Chad Rohlfsen said...

They actually show the Swedes closer to Iberians than La Brana!

Matt said...

perhaps K8 belonged to a ruling class of steppe origin, while P192-1 was of native Balkan stock, whose ancestors were conquered centuries earlier by the steppe nomads and forced to live as an underclass? If so, it wouldn't be the only time in history that this sort of thing has happened, especially within Indo-European societies.

It's possible, but I think it's also possible that the mound builders could've been raiders rather than in charge as such - roving outlaws with lots of booty. Bandits rather than tyrants. Think more vikings than Turkic rulers in Anatolia, or Mongols in China.

We don't know about the social structure here and the military advantages of steppe societies tend to be mobility based, with them faring more poorly once they settle down (replaced by the people they have conquered in all meaningful ways).

It seems a little easier to me to imagine a society of "reavers" remaining separate from the people they prey on (there are analogies in the historical cultures we're familiar with). And if the Indo-Europeans stayed separate roving bandits, tied to mobility and with no incentive to invest in the societies they preyed on, its easier to understand why a lot of the incipient settled social complexity seems like it was retarded by their apparent appearance in the archaeological record (not the case if they just stepped in as a new stationary ruling class).

Davidski said...

The people buried in the these Kurgans and Tumuli were usually of royal descent. This is probably why K8 doesn't appear to be a person indigenous to southern Europe, but someone with a lot of admixture from the putative Indo-European homeland far to the northeast of the Balkans.

Indeed, I suspect there might be some very strong parallels between Thrace and the Indo-European societies of South Asia and South America, where the native populations were subdued by the Indo-Europeans.

But even if we assume that the sample from the Thracian Tumulus is heavily contaminated and gives bogus results, then that still leaves the other sample, the one from the pit, which is also genetically more northern than Oetzi, and therefore probably admixed. So although this person seems to be from a lower caste than K8, he's also likely to be in part of Indo-European origin, rather than just culturally Indo-Europeanized. This suggests there was a migration from the north into the Balkans at some point after the Neolithic but before the Iron Age.

By the way, it's actually possible that the high incidence of R1a among the Bronze and Iron Age South Siberian samples from Keyser et al. 2009 was due to biased sampling, because all of the mummies that were tested came from royal Kurgans, and might not have been representative of the populations they ruled.

Andrés said...

Let's go back to the maths behind common PCA plots.

The Eigenvetors don't have an intrinsic, straightforward "meaning". A PCA of modern European Populations has a certain shape, but when you add Africans and East Asians it all changes in a non-trivial way because the Eigenvectors that best explain intra-European variation are not necessarily the best to explain all human variation.

In some PCAs the principal components will be those in which ancient samples are most differentiated. But changing the populations (adding lots of middle easterners for example) can lead to very different vectors. Maybe under those two new vectors ancient and modern Europeans do look alike.

Davidski said...

Let's not speculate and confuse people with math. Here's where La Brana-1 really plots compared to both Europeans and Middle Easterners.

http://eurogenes.blogspot.com.au/2014/02/pca-of-five-ancient-genomes.html

He's clearly outside the range of modern European and West Eurasian genetic variation. The same phenomenon has already been shown in a number of PCA in various studies, but it was less extreme there because in most studies the ancient samples are projected onto PC eigenvectors which are computed with modern samples only. Example...

http://eurogenes.blogspot.com.au/2014/04/low-genomic-diversity-among-ancient.html

The problem with the PCA in this study is that the projection bias is so extreme, that the results are basically meaningless for ajv70 and La Brana-1.

barakobama said...

"PS- BTW: notice that the Mozabite cluster does not show up in any European sample, suggesting that the element of African-like admixture in Bra1 or the EEF line is not linked to this modern but isolated Algerian community. In the case of EEF we can think of distinct Nile area origins but what about Bra1? My thought is that it actually reflects ancient Aterian elements, which are nowadays also diluted to near-nothingness in NW Africa (excepted Southern Morocco, where it's still apparent)."

Have you ever thought that La Brana-1 might have some EEF ancestry?

barakobama said...

"There are indeed many questions here but why should we dismiss this study's results, just because in this case Bra1 falls onto the Basque cluster instead of "Svalbard"? When you look at the ADMIXTURE results (figs. 1 & S1) it seems apparent that Bra1 actually cluster best with Basques too, for example (~60% affinity, most of the rest being "Russian"), even better than Gök4. And that's novel and interesting in itself. "

I don't understand why La Brana-1 and Ajv70 scored so high in Basque and Sardinian components in Underhill 2014's admixture. In Davidski's and Dienekes's admixtures, La Brana-1 scores high in west European-specific components, that are very high in Basque. West European components have more farmer admixture than east European, plus they are high in early European farmers when put into admixtures. That can be an explanation for La Brana-1 scoring a high amount in the Basque component, and the explanation may be similar for Ajv70.

"Similarly, again in the ADMIXTURE graph, Ajv clusters with Orcadians, Gok4 with Italians and only Ötzi aligns best with Sardinians. All that is consistent with the PCAs."

History won't be rewritten. We already have a good idea how stone age European hunter gatherers relate to modern Europeans. There are logical explanations for La Brana-1 and Ajv70's strange admixture results.

~80%Otzi+~20%Loschbour=Gok4. In the D-statisics Gok4 is shown to be most related to Sardinians, not Tuscans(only Italians tested in the D-statistics). He behaved exactly the same way as Otzi, the only differences where caused by his extra hunter gatherer ancestry.

"Beyond this debate, fig. 3 is also interesting: it confirms that there is African-like admixture at the root of the SW European cluster (via EEF surely), that Tuscans (even if in this branch) are mixed with "other Europeans" and that Finns have East Asian admixture. These we can confirm in the f3 formal tests available in the supplemental materials."

I agree the Sub Saharan ancestry found in Tuscans and Iberians, may have something to do with Basal Eurasians. I don't understand though why other Europeans didn't show evidence of Sub Saharan ancestry. Also, Tuscan can fit as being part Sardinian and part Sub Saharan African(LWK and YRI) even though Sardinians have more basal Eurasian ancestry.

A good way to test if basal Eurasian is just a mix of African and west Eurasian alleles, is to do the same test for all Europeans(preferable early farmers) but with MA1(pure west Eurasian) and Mbuti as population 1 and 2.


barakobama said...

My Underhill PA 2014 thread.

http://www.eupedia.com/forum/threads/30000-Two-Iron-age-Thracians-found-to-have-totally-different-genetic-origins?p=431520#post431520

Davidski said...

They didn't run the ancient samples in the same ADMIXTURE analysis as the modern samples, but instead used allele frequencies sourced from the modern samples to test the ancient samples.

This was a problem because it changed the conditions under which the modern and ancient samples were tested under, and resulted in much less precise outcomes for the ancient samples. In effect, this was the ADMIXTURE version of PCA projection bias.

There are two ways around this: a) run the ancient samples together with the modern samples, or b) source the allele frequencies from a subset of modern samples, and then use them to test the ancient samples as well as the rest of the modern samples. Then you can actually compare the modern samples to the ancient samples.

It really pisses me off when these sorts of skewed results are released, and people just take it for granted that they're accurate because the authors come from Stanford or wherever.

barakobama said...

I have read multiple sources that state the Thracians were known for red hair. I have also seen multiple depictions of Thracians with red hair, that were created by Thracians themselves.

Like these

http://info-bg.narod.ru/heritage_files/vino.jpg

http://upload.wikimedia.org/wikipedia/commons/thumb/2/2d/Thrace-ostrusha.jpg/250px-Thrace-ostrusha.jpg

http://www.shopbulgaria.com/files/products/cache/w_1169161200_5000_4000_thracians_3.jpg

Today red hair only reaches 1% in west Europe(and other areas because of western admixture) and Volga Russia, and doesn't show a good correlation with northern(WHG-ANE) ancestry.

But, still maybe red hair in ancient Thracians is because of some were very northern(WHG-ANE) in ancestry. Possibly the R1b L23 in the Balkans is connected with R1b L11 in western Europe, which has strong correlation with red hair and Indo Europeans. Also, Indo Iranians and Tocherians were known for red hair, so Thracian/west European/Indo iranian/Tocherian red hair may all be connected with some early Indo Europeans of from the steppes.

It's just an idea that just went through my head, i don't have much evidence for it.

About Time said...

@Matt, could the "ultra Iberian" ancestry in the Caribbean and Mexico be pre-Columbian? As in, Paleo-Iberian or Solutrean?

There should be a way to dissect this and distinguish "zombie ancestry" from "unknown paleo ancestry." Maybe looking at particular haplotypes and see their phylogeny?

Otherwise, it would be like getting a hint if ANE without any Mal'ta specimen.

And are there any plans to get better SNP haplotypes on the "R1*" lineages in Canadian First Nations and some US tribes? Presumed to be post-Columbian, but where is the up to date haplotype info to show it? Not hard to do, if the question is taken seriously and tested.

About Time said...

There are lots of Jewish redheads. I've seen no evidence they are any more northern genomically than dark haired Jewish people.

Askenazi genomes are remarkably homogeneous despite quite a bit of variety in personal physical appearance. Many look archetypally Nordic (more so than many other Europeans), but with Southern genomic ancestry.

Appearances mean bupkis.

Tocharians in China were also shown as red haired. Again, not clear that their western ancestry is particularly northern genomically.

barakobama said...

About Time, Jews being very red haired I think is a sterotype that is exaggerated.

https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&docid=U5KgtzfHJ9ziPM&tbnid=sOhJ4AkdpOtV_M:&ved=0CAUQjRw&url=http%3A%2F%2Ffamilyguy.wikia.com%2Fwiki%2FMort_Goldman&ei=3PdvU4GUM4eayATugYKYDw&bvm=bv.66330100,d.aWw&psig=AFQjCNGV3E-I3bEEgKIrKGIUVtZ9t_IS8w&ust=1399933274242466

Only around 1% of American Jews are red haired.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0CEoQFjAE&url=http%3A%2F%2Fwww.theapricity.com%2Fforum%2Fshowthread.php%3F35882-New-Hair-and-Eye-color-statistics-(2011)&ei=YPZvU6ncNYyKyAT62YD4AQ&usg=AFQjCNG1V1r38hvFfdhSjnN5K2j6DmuPUw&bvm=bv.66330100,d.aWw

From what i have learned about Samaritan's pigmentation, is that they are very pale and red haired. I don't have ant statistics to confirm just google images and news articles.

Red hair is a west Eurasian phenomenon, that only reaches 1% in western Europe and Volga Russia. I stated in my previous post that it doesn't correlate with northern ancestry. It is though more popular in northwest Europeans than southwest Europeans, and besides west Europe is most popular in northern populations.

The Tocherians like early Indo Iranians were probably very similar to northeast Europeans. It is safe to assume their northern European-like ancestry because of their overall light pigmentation, only seen in northern Europeans.

Red hair may have been selected for like blonde hair may have been after the Neolithic, and therefore would have first gotten over 1% in northern-like populations.

It has been a mystery why Thracians were known for red hair and why it is seen in their art, because red hair is so rare in Bulgaria today. K8's northern ancestors may be an answer. If red hair was more common in royal Thracians, that would be great evidence.

I doubt the red hair in Thacians has any connection, i just wanted to put the idea out there.

Maju said...

@AboutTime:

"could the "ultra Iberian" ancestry in the Caribbean and Mexico be pre-Columbian?"

Nope. It is European-like but not quite. I and some other people are convinced it is Guanche (Canarian Aboriginal, i.e. North African). There are many historical and now also genetic reasons to think that Castile used Canarians to settle the Caribbean. See: http://forwhattheywereweare.blogspot.com/2013/06/caribbean-autosomal-ancestry.html (particularly updates and discussion because at first I was oblivious to this Canarian trail and was a Puerto Rican who opened my eyes).

Grey said...

Red hair has been mentioned over a wide area from Libya i.e. North Africa, to the Western Barbarians on the border of China.

Personally I don't think it was northern originally. I think it may have been common (for vitamin D reasons) over a wide area in a cone distribution starting somewhere NW of India and then it declined later due to the spread of the SLC genes, surviving best in the cloudiest regions i.e. NW Europe.

This would explain its continuing rare presence across a wide area and its concentrated presence in a few regions.

Davidski said...

Maju,

I just saw your write up on this study. If you want to know the real reason for the dodgy PCA results, then read this paper very carefully...

http://arxiv.org/abs/1211.2970

Davidski said...

I just spoke to someone at the Reich lab about this issue, because it shows up in PCA done with Eigenstrat as well.

They call this problem "shrinkage", because the PCA space is shrinked for the projected samples relative to the reference samples, and there's no automatic fix for it yet, but there might be soon.

Maju said...

Thanks again for the reference about the shrinkage. It should explain why the different PCA graphs look so different from each other in spite of including the same samples.

However it does not question the results re. the ancient samples, because these are pretty much the same as in the ADMIXTURE graph.

So we have two different statistical analysis producing about the same results, therefore they are most likely correct, given this dataset.

A key issue, that IMO is fundamental to comprehend, is that various different datasets will produce different results. The main difference between this dataset (in the PCA only) and the others (Lazaridis, Skoglund, Dasakali) is the absence of a West Asian sample. Logically the West Asia vs Europe difference affects the PCA in ways that a Europe only dataset (and more so a Europe minus Balcans, as is this one) does not experience. As normal PCAs have only two dimensions, this may be determinant, West Asian vs. Europe copes one of the axis invariably (the other typically is SW Europe vs the rest).

In Europe-only datasets instead the SW Europe pole typically becomes dual, with one axis representing Sardinians vs Russians and the other Adigey vs Basques.

In brief: when West Asians are thrown in Sardinians and Basques get closer and ancient foragers appear exotic (hyper-Nordic), when they are removed Sardinians and Basques diverge much more clearly and ancient foragers appear as more standard Europeans.

Which is "more correct"? Maybe both in different ways. I really do not know but it's a sampling strategy effect no doubt.

Davidski said...

Maju,

Each of the PCA look so different because they only used non-missing markers in the ancient samples to run the PCA, which means they used different sets of markers for each PCA.

And the reason the PCA match the ADMIXTURE analysis is because the ADMIXTURE analysis is also skewed. I already explained why in a comment above.

These analyses need to be run again and a correction put into PLoS Genetics.

"They didn't run the ancient samples in the same ADMIXTURE analysis as the modern samples, but instead used allele frequencies sourced from the modern samples to test the ancient samples.

This was a problem because it changed the conditions under which the modern and ancient samples were tested under, and resulted in much less precise outcomes for the ancient samples. In effect, this was the ADMIXTURE version of PCA projection bias.

There are two ways around this: a) run the ancient samples together with the modern samples, or b) source the allele frequencies from a subset of modern samples, and then use them to test the ancient samples as well as the rest of the modern samples. Then you can actually compare the modern samples to the ancient samples."

Maju said...

As I just told you in my blog, David, the n=1 weight of each of the ancient samples makes it nearly impossible that they show up their specificity unless you run the algorithm for much higher K values (they are not Neanderthals, you know: not so different from moderns to show up as distinct so easily). So what you say is irrelevant.

You can try at home "alternative (a)" and tell me what happens. It's a hypothesis easy to test, right?

Alternative (b) is the potentially biased one here: Dienekes arbitrarily chose to exclude the Basque component from his zombie collection, yet it is precisely this approach (approx. K=5 in this study, with a Sardinian but no Basque component yet) what causes Basques to score clearly higher errors relative to all ancient samples (see fig. S7), so it is a clearly WRONG choice and one caused by Dienekes' ideology and bias only (Basques are not less diverse than Sardinians or Finns, so his pretext is spurious).

Davidski said...

It's basic high school science; you can't test one group of samples under different conditions than another group of samples, and then expect to be able to compare the two sets of results.

They're not directly comparable, unless you can somehow correct for the extra variable, which in this case is bias, very much like the PCA projection bias.

That's just the way it is. I have no idea what you're even arguing about? Or did you perhaps correct for the bias? Show us the new results then.

Maju said...

Do you believe in homeopathy or something? The ancient n=1 samples are way too small (and not distinct enough) to affect the outcome in any significant way, exactly the same that one molecule of component X in a liter of water is irrelevant (unless maybe if it is LSD).

The results of Admixture are a function of two factors: sample size and distinctiveness. Maybe a single Neanderthal will still have an effect, because it is so extremely distinctive that the algorithm will notice, but with these West Eurasian samples there will be no difference that we can discern, unless you clone each ancient sample many times or dramatically reduce the modern samples' size (better).

"very much like the PCA projection bias."

Exactly: it is the same as "projecting" or using arbitrary zombies. Just that, for all I know about Admixture, not "projecting" them should produce very similar or even identical results in this case, for the "weight" reasons outlined above.

"Or did you perhaps correct for the bias? Show us the new results then".

You should know that my technical capabilities with these algorithms are limited, as are my PC's processor and memory (it "hangs" most of the time when processing too much data, such as very large images). However I'm 100% willing to collaborate with anyone technically capable, such as yourself, in order to test your hypothesis and other related issues.

Just drop me a mail or, if you think you understand my qualms well enough, go ahead and do it on your own. Testing the (a) hypothesis is as easy as repeating the same Admixture run up to K=8 with the same samples plus the ancient ones. I personally don't feel capable even to add a single sample to the pre-packaged 1000 GP dataset but I know you and others can. So please, go ahead and test your own hypothesis.

Davidski said...

Holy shit, how difficult can this be?

If you test a sample with allele frequencies from an ADMIXTURE run you'll bias the results. For example, if you test a Basque individual in this way, you won't get typical Basque ancestry proportions, but maybe something like 50/50 Basque/French.

That's why the ancestry proportions of the ancient samples in this study are biased. It's exactly the same problem as on the PCA.

I call it the Calculator Effect, and avoid it like the plague when I design ancestry tests, because if I didn't then these tests would be worthless. People from the UK would be coming out German or even Hungarian.

Maju said...

"For example, if you test a Basque individual in this way, you won't get typical Basque ancestry proportions, but maybe something like 50/50 Basque/French".

I seriously doubt it but, please, do it (I don't think I can) and I will be persuaded.

"That's why the ancestry proportions of the ancient samples in this study are biased. It's exactly the same problem as on the PCA".

Again, please run a test for your hypothesis. It either will confirm it or reject it (or maybe something in between). My opinion is that it will reject it because n=1 samples generally don't matter at all.

The PCA's "problem" (?) is anyhow generalized, because in all similar studies the ancient samples are all "projected". However in this case they perform differently, so it's not caused by projection but by something else (IMO the presence or absence of West Asians in the sample).

Yes: "how difficult can it be?", indeed.

Davidski said...

Here, I forgot I had this online.

Scroll down to the bottom. The first result for PL1 was achieved with ADMIXTURE, and the second with allele frequencies using a calculator. Check out the difference in the spread of the main components.

https://docs.google.com/spreadsheet/ccc?key=0Ato3EYTdM8lQdFh6SzZyOEdMT2kyUmY0cS1PaW1maXc#gid=0

Maju said...

On first sight I see no difference and I presume it is because you first run test 1 unsupervised and then the supervised test 2 with the same samples and using the "zombie" components produced in the first test. How does this prove anything?

Try using the same "zombie" components in a substantially different sample and then run the unsupervised algorithm. The results should not match.

Davidski said...

You still don't see a difference?

PL1 - Test 1

Orcadian 28%
East Asian 0.8%
Lithuanian 71%

PL1 - Test 2

Orcadian 42.6%
East Asian 1.7%
Lithuanian 55.8%

All of the other samples show basically the same results in both tests because they were tested with ADMIXTURE in both tests. Only PL1 was tested with the allele frequencies in Test 2.

I have no idea why anybody would not get this?

Matt said...

I wouldn't say the dimensions are inaccurate as such, just misleading.

They're picking up on the differentiation between Basques, Adygei, Sardinians and Russians accurately. But these are not dimensions that are relevant to ajv50 and La Brana.

ajv50 is being assessed against PC1 dimensions that measure Sardinian vs Russian, with Adygei and Basque equally at the middle and PC2 Basque vs Adygei, with Russians more to the Adygei side.

La Brana is being measured against dimensions that contrast Basque+Russians (PC1) to Adygei and Russians+Adygei to Sardinians (PC2).

No doubt these capture the present day population differences fairly well. But much of this is going to be recent drift and isolation by distance which is irrelevant to ancient samples like La Brana and ajv50, rather than the familar dimensions of HG vs Neolithic (PC1) and more or less residual ENA / African affinity (PC2) that we find in the Eurogenes and Skoglund plots, which capture the general and ancient trends.

Davidski said...

Matt,

The PCA results are inaccurate because they suffer from a well documented problem called projection bias.

That's when the projected samples are biased towards 0 relative to the reference samples.

This is caused by the fact that there are many more markers in the dataset than reference samples. You can read about it here...

http://arxiv.org/abs/1211.2970

It's an error that can and should be fixed.

Maju said...

@David: as you're so cryptic in your statements I did not understood before what you meant about "PL-1" (what is PL-1?, why does it matter?) I just looked at the general results, which in most cases are identical.

I still don't get your point. What did you exactly do with those tests?

Maju said...

@Matt: I believe you are onto something; although in most cases the projected ancient samples do not show up as perfectly neutral (0,0), they do have that tendency instead of being hyper-something (the foragers), as happens in the other PCA projections.

In both cases they show up as tending to Russians and Basques vs Sardinians and Adigey, what makes total sense for all I know.

Davidski said...

PL1 is me.

In the first test I tested myself along with all the other samples using ADMIXTURE and came out Polish.

In the second test I ran all of the other samples with ADMIXTURE and then used the allele frequencies from that run to test myself. I came out Dutch.

One of these methods obviously doesn't work. Which one Maju? Take a wild stab in the dark if you still can't figure it out.

Davidski said...

Actually, probably not Dutch, but certainly not Polish, so you get the point.

Or not.

Matt said...

That's when the projected samples are biased towards 0 relative to the reference samples.

I'm not sure I understand the distinction between shrinkage and bias, and the dimensions just not being predictive for samples they weren't conditioned against, because they capture information relating to the samples they are conditioned against.

Europeans would, if projected, largely fall at 0 for a dimension distinguishing Chinese, Koreans and Japanese, because Europeans lack any meaningful relationship to the differences between them, which are largely recent drift.

You couldn't really "correct" for this because Europeans really would be at 0 on these dimensions.

What you could do is build a new set of dimensions, including the Europeans rather than projecting them. But these would just be a new set of dimensions. They wouldn't be a corrected error.

This is pretty much the scenario I'm describing.

This would apply to a lesser extent to an individual Japanese who was projected - his differences from the Chinese and Koreans wouldn't be captured as well as any of the original Japanese included, so he would fall more to 0 on the dimensions that distinguish the populations (as you've illustrated through your admixture analogy).

Is this different from "shrinkage"?

Davidski said...

Shrinkage and bias are the same things in this context.

Basically, if you run a PCA, and don't correct for projection bias, the projected samples will be pulled towards the middle of the plot because their PCA space will be smaller than that of the reference samples.

But what I'm finding is that the bias is less pronounced when more reference samples are used, and also possibly when the eigenvectors are computed with populations that show high Fst (genetic) distances between them.

So the bias usually isn't all that bad on global plots, but it's completely out of hand on plots of Europe or West Eurasia that only use a few samples, like those limited to the European or West Eurasian HGDP populations.

Matt said...

@ AboutTime " could the "ultra Iberian" ancestry in the Caribbean and Mexico be pre-Columbian? As in, Paleo-Iberian or Solutrean?
There should be a way to dissect this and distinguish "zombie ancestry" from "unknown paleo ancestry." Maybe looking at particular haplotypes and see their phylogeny?


I wouldn't think so but, I'm not aware of how you could test for that specifically.

I initially was thinking it might be because some of what defines the Iberian peninsula (particularly Basques / Sardinians) in Europe is low levels of ANE, which is also high in Native Americans. So extracting Native American segments from Mexicans could accidentally also take ANE segments which were present in their parent Europeans, and so leave a "ultra Iberian" signature.

But I think this might not really pan out, so Maju's explanation with Guanches as the European founders for the Mexican population might be better.

About Time said...

@Matt, if the "ultra Iberian" is not a zombie effect, then Guanche could be a reasonable answer (still requiring testing).

Hand waving is never a substitute for good science, which requires: 1. Propose a hypothesis, 2. Test hypothesis, 3. Reject or fail to reject idea bases on that experiment.

Whether any hypothesis sounds perfectly reasonably, unlikely, or completely wacky to the (biased) human mind is moot. Repeated experimental results are all that matter. And even then we only have hupotheses that are not yet disproved or replaced by a better hypothesis.

The worst enemy of good science isn't wacky ideas; it's hand waving in the absence of experimentation.

Maju said...

@Alright, David, now I understand what you say: that there is some distortion effect in "projection". While Dutch and Polish are not so dramatically different categories, nor are Lithuanian and Orcadian (it's not such a big deal unless we are "splitting hairs" as is the case with intra-European genetics), but I understand now what you mean. Sorry for my confusion.

In any case, the projection problem should be the same in all recent studies on autosomal aDNA, because all them projected the ancient samples on the PCA. However each gives a different result, so it's not just the projection: there's something else at play and I believe that it is the effect of different sampling strategies: one that includes West Asians and/or related peoples (such as Balcanics or Sicilian/Maltese) and another that does not.

@About Time: I totally agree that the Canarian hypothesis needs experimentation but that's something that I, as amateur, leave to more technically capable people. If I knew how to run Admixture better, I could probably test that myself, but I really don't feel competent enough, so all I hope is that sooner than later some researcher (or even amateur researcher with better technical competence than mine) picks up the idea and tests it, using some North African controls and less unnecessary European overloading of the samples.

So far it is only a coherent hypothesis, coherent with partial tests, such as that of the grandmother of Charles (the Puerto Rican guy who led the way in this issue) or the historical data about Castilian repeated recruitment of colonists and colonial troops in the Canary Islands, century after century.

It is much more reasonable in all aspects than the speculations you have mentioned.

Davidski said...

I've already explained in detail why...

1) The ADMIXTURE results of the modern and ancient samples in this study are not directly comparable

2) The PCA results are biased

3) The different PCA show slightly different results for the same samples

Anyone who is capable of logical and unbiased thought will agree that what I said makes sense.

Ryan said...

A two dimensional PCA is just a fairly rough approximation isn't it. I don't think that just because the eigenvectors happen to be uninteresting makes them biased or wrong. It's just a limitation of the methodology. A more accurate picture would have kept more than just the first two loading vectors, but unfortunately most people can't visualize things in N dimensions.

Just because these ancient samples show up within modern day human variation doesn't make this PCA wrong. It just means that the varation that places them outside modern Europe was represented in one of the N-2 loading vectors that were dropped. That's not bias; it's just how a PCA that only uses the first two components works.

It may also make these PCAs boring and useless of course. Just not wrong per se.

Ryan said...

I wonder if those residuals for La Brana re: Africa have to do with the same sequences that gave a false positive for Hadza admixture into Lithuanians from a while back.

Davidski said...

Ryan,

A projected sample will land in a different part of the PCA than where it really should unless a correction is made to account for projection bias.

That's because the projection causes a shrinkage of the PCA space for the projected sample compared to the non-projected samples.

As a result, the projected sample is biased towards 0, in other words, it's pulled towards the middle of the plot, and thus ajv70 and La Brana-1 have gone from being outside of West Eurasian genetic variation, where they really are, to firmly within it on the PCA in this paper. So the result is indeed technically wrong and due to an error.

That's the simple explanation. For more details, see this paper...

http://arxiv.org/abs/1211.2970

Ryan said...

Ah, I see what you're getting at now Davidski. The angle of each point is still valid, it's just the comparison of the distances from the origin that's completely invalid. Thanks for the link.

Arch Hades said...

"In other words, perhaps K8 belonged to a ruling class of steppe origin, while P192-1 was largely of native Balkan stock, whose ancestors were conquered centuries earlier by the steppe nomads and forced to live as an underclass?"

Well if that's the case the "ruling class of Steppe origin" is a lot more like modern Italians & French than modern Eastern Europeans (Russians) just by looking that the ADMIXTURE analysis chart.

Too bad Bulgarians weren't sampled.

Davidski said...

You can't directly compare the ADMIXTURE results of the ancient samples to those of the modern samples because they were tested under different conditions.

But you can compare the results of the ancient samples to each other, and K8 basically looks like a Mesolithic hunter-gatherer.

Maju said...

I decided not to make any analysis of K8 because, according to the paper, there is a very strong suspicion it is contaminated by modern DNA (it's DNA damage patterns are not consistent with its age but rather with modern sequences).

Davidski said...

Yeah, I know, but the other, apparently non-contaminated Thracian, also appears to have non-trivial "Russian" ancestry, which Oetzi almost lacks completely.

Maju said...

I mostly meant to tell Arch Hades, who is insisting on K8.

Anyhow, all ancient sequences but Ötzi have some Russian affinity judging on this paper. The only way to know that is the ADMIXTURE graph that you so vehemently criticized. In that graph P 192.1 is about the same "Russian" as Gok4 (Swedish farmer), i.e. low and similar to French or rather Italians.

The main difference between P 192.1 and Gok4 is that the former displays some "Druze" green component, that is significantly lower in the latter. Instead on the "Sardinian" orange component it is the other way around.

K8 shows more "Russian" component and, for whatever is worth, resembles French or Orcadians but with "Palestinian" brown instead of "Adigey" magenta. This change does not suggest any particular "Russian" specific origin (it should be more "Adigey" and less "Palestinian" if anything). But anyhow the sequence is highly dubious so best to ignore it.

The more "Russian" sample is Ajv70 (Swedish Chalcolithic "forager"), which is 50% in the blue component. K8 and Bra1 are similar but neither one is extremely "Russian".

Seinundzeit said...

Genetiker isn't the sort of person I like to cite in any context. Regardless, he has been very helpful by posting DIY Calculator results for a bunch of ancient samples. I was interested in the (seemingly) uncontaminated Thracian's results, as well as the sample from the pit sanctuary. The sample from the Thracian tumulus has Pakistani/Northwest Indian/Afghan levels of "Gedrosia", and Afghan/Pakistani levels of "South Asian". Just looking at those two components, he could be a South Asian if he was alive today. The sample from the pit sanctuary doesn't have any affinity to South Asia. For what it's worth, the oracle results:

T2G2
[1,] "Makrani" "54.9139"
[2,] "Balochi" "58.9139"
[3,] "Pathan" "60.0106"
[4,] "Sindhi" "61.3392"
[5,] "Brahui" "61.3446"

[1,] "57.8% Balochi + 42.2% Yoruba" "29.9287"
[2,] "57.8% Balochi + 42.2% YRI30" "29.9287"
[3,] "56.5% Brahui + 43.5% Yoruba" "30.039"
[4,] "56.5% Brahui + 43.5% YRI30" "30.039"
[5,] "60.2% Makrani + 39.8% Yoruba" "30.0814"

That Sub-Saharan African noise is problematic, but the South Asian affinity is solid. It's in line with what we've already seen.

P192-1
[1,] "Greek_D" "13.9912"
[2,] "Ashkenazy_Jews" "17.4968"
[3,] "Ashkenazi_D" "18.5725"
[4,] "Sicilian_D" "19.1125"
[5,] "C_Italian_D" "19.4443"

[1,] "85% Greek_D + 15% Georgians" "11.7944"
[2,] "61.1% North_Italian + 38.9% Georgians" "11.8737"
[3,] "84.3% Greek_D + 15.7% Abhkasians_Y" "11.8818"
[4,] "59.4% North_Italian + 40.6% Abhkasians_Y" "12.079"
[5,] "71% Tuscan + 29% Georgians" "12.1484"

For comparison, MA1's results:
[1,] "40.4% Jatt_D + 59.6% Chuvashs" "13.2415"
[2,] "47.7% Finnish_D + 52.3% Burusho" "13.5837"
[3,] "47.8% Finnish_D + 52.2% Jatt_D" "13.6236"
[4,] "51.2% Burusho + 48.8% FIN30" "13.6423"
[5,] "51.1% Jatt_D + 48.9% FIN30" "13.6713"

As is obvious, he is consistently a mix between northern South Asians and northeastern Europeans/Chuvash.

His DV3 oracle results:
[1,] "61.4% Finnish_D + 38.6% Madiga" "23.7978"
[2,] "61.1% FIN + 38.9% Madiga" "23.9043"
[3,] "63.3% Finnish_D + 36.7% TN_Dalit" "24.0179"
[4,] "67.2% Finnish_D + 32.8% Irula" "24.0648"
[5,] "62.6% Finnish_D + 37.4% Mala" "24.1557"

In this case, consistently 60%-65% Finnish, 40%-35% "scheduled caste" South Indian.