Thursday, November 25, 2010
Experiment: digging out previously unreported Sub-Saharan African admixture in Sardinia
The Sardinian samples from the HGDP are always, as far as I know, classified as entirely of West Eurasian origin via clustering algorithms like ADMIXTURE and STRUCTURE. In other words, these Sardinians completely fit into clusters that peak north of the Sahara and west of Central Asia. So it would appear that gene flow to Sardinia from neighboring Africa has been minimal, or even non-existent.
But that's not what I found when I took a closer look at their genomes, as well as those of over 250 other Europeans with apparently no extra-European ancestry, as shown by my own ADMIXTURE analyses. I picked one of my favorite "local admixture" programs for the job, called RHHcounter (see here for more details), setting the rare genotype detection level at 0.01%.
Quite a few of the individuals showed tiny clusters of 3-4 genotypes that were only common outside of Europe, usually in Africa, East Asia or the Americas. These were often too small to investigate further. However, I spotted two segments that were large and clear enough to warrant more detailed analyses. Surprisingly, these belonged to two of the HGDP Sardinians - HGDP00672 and HGDP00673. Below are their Chromosome Mosaics, courtesy of RHHmapper, along with MDS plots based on all the SNPs from the aforementioned segments (marked by arrows). The MDS plots include samples from Europe, North Africa and Sub-Saharan Africa.
As per above, the MDS plots were produced using all the genotypes contained within the relevant segments (over 300 and 2000 SNPs respectively), and not just those that were detected by RHHcouter in the analysis. Obviously, what this shows is that only a fraction of the extra-European genotypes were flagged, while the rest nearby remained undetected at this threshold.
I can't see any explanation for these results other than relatively recent gene flow from Sub-Saharan Africa to Sardinia. What this means, of course, is that there must be a reason why model-based algorithms can't pick up such admixtures in certain samples. As suggested by the authors of RHHcounter, perhaps the segments are too small and/or contain too few SNPs to have an impact on overall ancestry estimation? However, I also suspect that because Sardinia is something of a Southern European genetic isolate, the Sardinians are too easily classified as Europeans by ADMIXTURE, STRUCTURE etc., which might mask at least some of their minority admixtures.