search this blog

Thursday, November 25, 2010

Experiment: digging out previously unreported Sub-Saharan African admixture in Sardinia


The Sardinian samples from the HGDP are always, as far as I know, classified as entirely of West Eurasian origin via clustering algorithms like ADMIXTURE and STRUCTURE. In other words, these Sardinians completely fit into clusters that peak north of the Sahara and west of Central Asia. So it would appear that gene flow to Sardinia from neighboring Africa has been minimal, or even non-existent.

But that's not what I found when I took a closer look at their genomes, as well as those of over 250 other Europeans with apparently no extra-European ancestry, as shown by my own ADMIXTURE analyses. I picked one of my favorite "local admixture" programs for the job, called RHHcounter (see here for more details), setting the rare genotype detection level at 0.01%.

Quite a few of the individuals showed tiny clusters of 3-4 genotypes that were only common outside of Europe, usually in Africa, East Asia or the Americas. These were often too small to investigate further. However, I spotted two segments that were large and clear enough to warrant more detailed analyses. Surprisingly, these belonged to two of the HGDP Sardinians - HGDP00672 and HGDP00673. Below are their Chromosome Mosaics, courtesy of RHHmapper, along with MDS plots based on all the SNPs from the aforementioned segments (marked by arrows). The MDS plots include samples from Europe, North Africa and Sub-Saharan Africa.







As per above, the MDS plots were produced using all the genotypes contained within the relevant segments (over 300 and 2000 SNPs respectively), and not just those that were detected by RHHcouter in the analysis. Obviously, what this shows is that only a fraction of the extra-European genotypes were flagged, while the rest nearby remained undetected at this threshold.

I can't see any explanation for these results other than relatively recent gene flow from Sub-Saharan Africa to Sardinia. What this means, of course, is that there must be a reason why model-based algorithms can't pick up such admixtures in certain samples. As suggested by the authors of RHHcounter, perhaps the segments are too small and/or contain too few SNPs to have an impact on overall ancestry estimation? However, I also suspect that because Sardinia is something of a Southern European genetic isolate, the Sardinians are too easily classified as Europeans by ADMIXTURE, STRUCTURE etc., which might mask at least some of their minority admixtures.


3 comments:

Chris Davies said...

This is very interesting. I think you have hit the nail on the head, I have noticed for a while that certain high frequency HLA (Human Leukocyte Antigen) alleles and haplotypes in Sardinians are shared with sub-Saharan Africans and North Africans, but not Europeans or Near Eastern/Middle Eastern populations. One such allele is HLA-A A*30:02. Found at 19.0% in Sardinians, in the rest of Europe it is generally <2.0% or 0.0%. This allele is at peak world frequency and diversity in West Africans and Bantus, and at a lower frequency and diversity in Morocco [not sure about frequencies in the other North African populations yet]. I would be interested to discuss this with you further if you happen to read my message. Best Regards

Davidski said...

Yeah, that HLA allele might have been introduced fairly recently to Sardinia from Africa. But its frequency in Sardinia doesn't say much about the overall level of African admixture there, because it might be the result of selection. Indeed, an advantageous allele can spread very quickly on a fairly isolated island like Sardinia.

Based on everything I've seen, the level of genome-wide Sub-Saharan admixture in Sardinia is around 1%, while North African admixture might be a couple per cent higher.

Chris Davies said...

Thanks. The A*30:02 allele is part of the A*30:02-B*18:01-C*05:01-DRB1*03:01-DQA1*05:01-DQB1*02:01 haplotype.
http://en.wikipedia.org/wiki/A30-Cw5-B18-DR3-DQ2_(HLA_Haplotype)
Given the equilibration of the haplotype, it must have been present in Sardinia for a considerable period of time [up to 8000 years or greater?]. Unlike HLA B*35 which is known to be malaria protective, there is no such known role played by A*30:02. I am more inclined to believe that there is a founder effect at play. Combined with other African haplotypes in Sardinia [eg A2-B58-Cw7-DR16-DQ5.2] they account for 30% of Sardinian haplotypes by gene frequency. So it looks like there was a very early and significant African settlement on the island.