search this blog

Friday, October 28, 2011

Dienekes attempts to strike back...and trips up again

I just read Dienekes' retort to my criticism of his work. Hilarious stuff...

Actually, according to the PCA plot of the Yunusbayev et al. (2011) paper, they are transitional, being situated toward both the Balkans and the Caucasus, relative to Belorussians/Lithuanians, i.e., the populations that generally show peaks of East European-related components. This is also supported by the ADMIXTURE analysis that reveals Ukrainians to possess a Caucasus-centered component largely lacking in other Eastern Slavs, but shared with Balkan/Caucasus populations.

First of all, the problem with his analysis was that Ukrainians showed higher "West European" and lower "East European" than Poles. So Ukrainians were more "western" because they're more southern and eastern? What about the factor that Poles are really more western? Shouldn't that negate the pseudo-western character of the Ukrainians?

He's not making any sense at all. Simply, he's got a hybrid unsupervised/supervised spreadsheet up, and the results don't gel. In other words, they're not directly comparable between the two sets of samples, at least in some cases anyway. Why can't he put up a note that this is an issue and separate the two sets of samples in the spreadsheet?

To make matters worse, Eurogenes suggests that my euro7 analysis agrees with his K=10 which was presented two weeks later. So, apparently, I am posting correct information about Ukrainians 2 weeks before he does, and this means that I am turning around to his way of thinking rather than vice versa. Go figure.

It doesn't matter what came first. What matters is that some of the results he's posting, like those from the euro7 analysis, seem to be correct, and correlate with my own work, while some don't. The latter have to be taken down or corrected.

Eurogenes continues with his posting of supposed MDS/PCA plots supporting his thesis. Actually, what he has posted are plots based on metric distances in the space of admixture proportions; these are not genetic distances because e.g., a +/- 1% difference in a Sub-Saharan component results in the same Euclidean distance difference as a +/-1% in a European one, although the former affects genetic distance much more strongly than the latter. Metric distances are fine to quickly determine closeness of samples in the space of admixture proportions, but they are certainly no substitute for real genetic distances.

My thoughts exactly. That's why
the MDS plots I posted were based on raw SNP data, and not on metric distances in the space of admixture proportions. The reason I also posted the PCA plots, which were indeed based on the admixture proportions, was because, as he says, "metric distances are fine to quickly determine closeness of samples in the space of admixture proportions".

I am also, apparently, accused of neglecting to point out the deficiencies of Dodecad v3, and I am invited by Eurogenes to retract it completely! This proposal is equivalent to the idea that we should burn old topographic maps that were based on measurements with sticks, ropes, and trigonometers, because we can now measure distances with laser beams. And, it is funny indeed that I am supposedly neglecting the deficiencies of Dodecad v3 when, 3 weeks before the Eurogenes rant, I post exactly what its limitations are, and how it can be made better.

I haven't been able to find anything on his blog that explains the limitations of the hybrid unsupervised/supervised system. If not presented in their proper context, many of the results obtained via this system are simply erroneous.

It is unfortunate that Eurogenes has chosen to go down that path. Envy is not a good guide to behavior, and perhaps, instead of relishing at the prospect of putting others down, he could spend a little more time inventing something of his own.

It's unfortunate that Dienekes is so aggressive and arrogant when someone tries to alert him to problems or potential problems. Keep in mind, I first raised these issues via a few short comments at his blog, and never intended to write whole articles on the subject. However, his reaction to my posts changed my mind very quickly.

Wednesday, October 19, 2011

Erroneous results from Dodecad (aka. Dienekes)

A while back, Dienekes welcomed "peer review" of his work, which I thought was very commendable. I recently spotted a serious error in his analysis, and let him know about it over at his blog. I was hoping to see a correction, and also an admission that his methodology was faulty. Unfortunately, this hasn't happened to date, so I thought I'd describe the problem in detail here.

In the blog entry Yunusbayev et al. (2011) data assessed with Dodecad v3, Dienekes analyzed samples with ADMIXTURE in "supervised" mode using allele frequencies obtained from a run that didn't include these samples. He posted the results in a spreadsheet, which can bee accessed here.

Obviously, my area of interest is the genetic ancestry of Poles, other Balto-Slavs, and nearby populations. So it only took me a matter of seconds to notice that something was off about the results for several of these groups. For instance, Poles are listed in the spreadsheet as 34.5% West European, and 44.3% East European. On the other hand, the more easterly Ukrainians show 38.5% West European, and only 31.5% East European. Also, the Mordvinian sample from near the Volga scores 38.1% West European, and only 32.5% East European.

The first port of call when checking the validity of such results is to see whether they gel with geography. Clearly these results don't. So either something isn't right, or there are factors that work against the general rule of genes = geography. When I alerted Dienekes of these seemingly implausible figures, he was in favor of the second scenario. His reply was as follows:

Ukrainians' higher West/east European ratio makes perfect sense as it is transitional to both the Caucasus (where there are even higher such ratios) and to the Balkans. Their ratio is exactly what one might expect from their geographical position vis a vis. Russians, Belorussians, and Balts, ie. , populations with a high E/W ratio.

Mordvins are also in line with other Uralic populations (Finns, Selkups) in having an inverted European ratio relative to Balto-Slavs., the results don't make perfect sense. They make no sense at all. There's no way these Ukrainians can be described as transitional to the Balkans and the Caucasus compared to Poles, even if the term is used very loosely. Below are two MDS plots. The first one shows that the same Ukrainians (UA) used by Dienekes do not cluster closer to the Balkans than Poles do (PL), and only barely closer, on average, than the Belorussians. The second plot shows that Ukrainians (UA), Poles (PL) and Belorussians (BY) are all about the same distance from the Caucasus.

In theory, it's possible to argue that the plots above produced different results to Dienekes' analysis because they used only the two most significant dimensions of genetic variation. On the other hand, ADMIXTURE works in a very different way, and so can reveal details past the first two dimensions. But that would be a stretch, because generally speaking, when a population appears to be transitional between two others in an ADMIXTURE run, such results are often very easily reproduced with MDS/PCA plots.

Moreover, I've actually analyzed the same and similar samples with ADMIXTURE and have been unable to reproduce Dienekes' results. In other words, as per geography, Ukrainians are less Western European than Poles, and more Eastern European. This shows up in my latest Eurasian K=10 run (see here), where, on the balance of all the components, the Ukrainians and Mordvinians are more Eastern than Poles.

Below are two PCAs, the first one shows the bizarre results produced using data from Denekes' spreadsheet, with Mordvinians clustering with Ukrainians and Hungarians along Component 1. The result is more reliable along Component 2, because that seems to be picking up North Eurasian admixture in the Mordvinians and Russians, which is much lower in Hungarians, Poles ad Belorussians. The second plot is based on my K=10, and shows a more expected result all round, with the Mordvinians lining up with both Russian samples (RU and North Russian) along Component 1, and also very close to the North Russians along Component 2. They also cluster with the same North Russians in Yunusbayev et al., rather than with the Ukrainians.

A whole range of PCA plots can be produced using the data from the supervised Dodecad V3 and my Eurasian K=10, in which the former results look at least a little out of whack with reality, while the latter appear as expected.

Interestingly, Dienekes' new
euro7 analysis supports the results obtained by me. In this experiment, the same Ukrainians and Mordvinians were used in the initial run that set up the clusters, and came out amongst the most Northeastern European and least Northwestern + Southwestern European samples on the sheet. Now that makes perfect sense.

So what happened? Are these euro7 components different enough to make the results better match geography? Yes, they're a lot more in tune with reality due to a higher quality dataset, with more samples from key areas of Europe and Caucasus. However, it's also clear that the supervised analysis produced erroneous results. It's obvious that it's not always possible to correctly analyze samples with allele frequencies from ADMIXTURE runs in which they were not included, especially versus those that were.

Now that the sampling is better, Dienekes' euro7 shows the previously mentioned Uralic Selkups to have a higher level of membership in the cluster that peaks in Balto-Slavs, than in those which peak in Northwestern and Southwestern Europeans. This is obviously a turn-around from his Dodecad V3 result. So which is correct? Strictly speaking, they're both correct, because the components that form in ADMIXTURE runs are dependent on the allele frequencies in the dataset used, and the number of K (clusters) set by the user. These clusters might peak in different groups depending on the dataset, but the results will usually make pretty good sense in relative terms. Indeed, on the balance of their overall results, across all the ancestral components in the V3 and euro7, the Selkups don't appear very different. They cluster in generally the same area relative to the other samples. See, for instance, their positions on two PCAs based on the V3 and euro7. So unlike the supervised results, it's not possible to outright declare the unsupervised Dodecad V3 results as erroneous.

However, I would say that the appearance of such a dominant Western European-based cluster as seen in the V3 is, at the very least, surprising. For instance, why would the Siberian Selkups carry more allele frequencies that appear Western European than Eastern European? The Uralic theory proposed by Dienekes really doesn't seem plausible. I don't know how many times Dienekes repeated his experiment to see if the results were stable, but scientists often run their experiments as many as 100 times each, and then publish the most consistent results.

If Dienekes obtained those results from multiple runs, and it was a stable effort, then that's fine. However, the Western European-based cluster still looks unusual enough to treat it with great caution. Suffice to say that it's not something that can be reliably used to theorize about the peopling of Europe, or the genetic ancestry of linguistic groups, like the Uralics. Dienekes did this, which I thought was very naive of him. But it was even more naive of many people to take his musings seriously. I don't believe that he'll ever be able to produce similar results with his updated dataset (like the higher West/East European ratio in the Ukrainians, Mordvinians and Selkups).

Obviously, there's nothing wrong with experimentation. That's what science and genome blogging are all about. We're not just here to provide a genetic ancestry service, but also to try and unravel mysteries that are taking scientists years to get around to via the convoluted peer review system in journals. Mistakes will happen, because boundaries are being pushed, but these mistakes have to be corrected.

Update: Dienekes attempts to strike back...and trips up again

Monday, October 17, 2011

Pigmentation genetics of Europeans

The maps below are based on three genome-wide SNPs showing high correlation with blue and green eyes and/or fair hair in Europeans. The results obviously suggest that there's an increase in hair and eye blondism from south to north in Europe, with clear peaks east of the Baltic Sea. The three SNPs are rs1667394 (HERC2 gene), rs12913832 (OCA2 gene), and rs12896399 (SLC24A4 gene).

Indeed, I'm a bit taken aback by the very high rate of suggested blondism among the Belorussians and Mordvinians (second highest dot in Russia). This might have something to do with sampling bias. Perhaps most of the 12 Belorussians and 16 Mordvinians used here came from fairer than average communities within their respective nations? I have no idea. In any case, I don't think the picture is too far from reality, because multiple sampling sites from the same general biogeographic zones, but several hundred kilometers apart, are showing very similar results. This can't be a coincidence.

It's also interesting to note that the East Baltic peak in blondism genotypes correlates closely with the North + East European genome-wide ancestral component in my latest ADMIXTURE experiment (see here). Perhaps this is where natural selection for these traits was most extreme due to very specific environmental pressures, like lack of sunlight? Maybe this is also where these traits spread from, either gradually or during one or several major migrations? Someone should look into that. Meantime, I'll try and update this post with new maps as more samples come in.

Friday, October 7, 2011

European admixture among ancient East Asians (two-rooted canines carried by early Indo-Europeans to China)

Two-rooted lower canines are rare in humans, but they are most commonly found among Europeans, at levels of up to 9%. A new study reveals that this trait reached unusually high frequencies in ancient groups from East Central and East Asia, particularly those of Afanasevo, Scythian, Uighur and Ordos origin (2.8% to 4%). This is a strong indication that such groups carried significant European ancestry, and were possibly the descendants of the same European migrants who took R1a1a and Indo-European culture deep into Asia after the Neolithic (see here).

In Table 1, the population variation of two-rooted lower canines is shown for major populations of the world. To emphasize the point that this is a European trait, of the 12,128 individuals included in the table, only 306 express two-rooted lower canines (2.5%) but of these 83% (254) were Europeans. If you include related Asiatic Indian, Middle Eastern, and North African populations, this number increases to 89% (272/306).


The presence of the two-rooted canines in East Asia may provide some clue as to the eastward migration of new populations into China and Mongolia. The largest numbers of individuals with this trait are concentrated along the western and northern frontiers of China and Mongolia. Archaeological excavations support the large scale movement of people into this area during the Bronze age (ca. 2200 BCE–400 BCE). Burial artifacts and settlement patterns suggest cultural and technological ties to the Afanasevo culture in Siberia, which in turn is linked archaeologically, linguistically, and genetically with the Indo-European Tocharian populations that appear to have migrated to the Tarim Basin ca. 4,000 years ago (Ma and Sun, 1992; Ma and Wang, 1992; Mallory and Mair, 2000; Romgard, 2008; Keyser et al., 2009; Li et al., 2010).

Christine Lee and G. Richard Scott, Brief Communication: Two-Rooted Lower Canines - A European Trait and Sensitive Indicator of Admixture Across Eurasia, American Journal of Physical Anthropology (2011), DOI: 10.1002/ajpa.21585