skip to main |
skip to sidebar
As mentioned in my last blog entry, I've been playing around with Structure lately, using sets of markers from various scientific studies. What I've noticed was that southern Europeans showed influence from the Middle East/North Africa, while northern Europeans more from South-Central Asia. Indeed, that has been one of the few and most noticeable differences between them. It makes perfect sense though, because Europe was mainly populated from the south and east in various waves. Anyway, it's good to see my own experiments backed up by latest peer reviewed work. Check out the frappe analysis below from López et al. 2009, showing Central Asian influence in the Orcadians and North Russians, and Middle Eastern influence in North/Central Italians. They all seem to show South-Central Asian admix too.

López Herráez D, Bauchet M, Tang K, Theunert C, Pugach I, et al. (2009) Genetic Variation and Recent Positive Selection in Worldwide Human Populations: Evidence from Nearly 1 Million SNPs. PLoS ONE 4(11): e7888. doi:10.1371/journal.pone.0007888
Here's what I got with a quick Structure run using just 3516 SNPs from Chao Tian et al. 2009. There's certainly a bit of noise here compared to the above plot, because it wasn't exactly the most detailed run I've ever done, but it still looks pretty solid. The Polish sample on the second graph is me.
Key: Europeans 1-7, 20; Middle Easterners 8-10; North Africans 11; South-Central Asians 12-19.

Chao Tian et al., European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing Among Diverse European Ethnic Groups, Mol Med. Published online 2009 August 24. doi: 10.2119/molmed.2009.00094
If you bought a genome-wide scan at 23andme and/or deCODEme, then you have access to your raw data, which gives you the option of going beyond the bio-geographic analyses offered by these companies. For example, you can use various programs to compare yourself to publicly available samples from around the world. Structure is one of the more popular tools for this sort of thing, so here's a guide how to set up a quick analysis using Structure and a data sheet from Kosoy et al. 2009:
- Download the 2.3.2 Beta version of Structure, with the graphical front end, which makes things a lot easier.
- Extract the following 125 SNPs from your raw data. Actually, 128 are listed on that sheet, but only 125 currently available at 23andme, although that's not a problem.
- Convert the genotypes to integers, as per the instruction sheet above. For example, if you're AG for rs731257, then convert that to 12 (ie. A=1, G=2). The three missing SNPs, as well as any no-calls, should be listed as 55.
- Download the sample data sheet, and add yourself to it. Make sure you look exactly like all the other samples on there, so you'll need to add the various tags that precede the genotypes. For example, instead of "EURA CEU CEPH1334.10 1", try something like "EURA POL Myself 1", or if you're African American then maybe "AFR AME Myself 2".
- Start Structure and load up the data sheet by going to "File" and then "New Project". Fill in the necessary fields in the Project Wizard, such as: Number of individuals 639; Ploidy of data 1; Number of loci 128; Missing data value 55. Then tick the following boxes: "Row of marker names", "Data file stores data for individuals in a single line", "Individual ID for each individual", "Putative population origin for each individual", "USEOFPOPINFO selection flag", and finally "Sampling location information".
- Define the parameter set (ie. go to "Parameter Set" and then "New"). The length of burn-in period should be at least 10,000 and the number of MCM reps about 50,000. Of course, to save time you can reduce both, especially if you're not too worried about a bit of noise. On the other hand, if you want to minimize noise as much as possible, then go up to something like 100,000 burn-ins and 500,000 reps. But be warned, runs like this can take days.
- Press the "!" button, specify the number of clusters (K) you'd like to divide the samples into, and click "OK". Alternatively, you can let the program work its way from K2 to whatever; Project > Start a Job > pick the parameter set > specify the K range (for example, from K2 to K6) > press the "Start" button.
Here are my results at K4. Obviously, if you're of overwhelmingly European origin, it's unlikely you'll get anything below 99% European/West Eurasian with these 125 markers. Much larger sets of SNPs are needed to get more detailed admixture estimates, and to break down the intra-West Eurasian and intra-European components.
Indeed, if you're good with Excel and Access then it's even possible to go up to something like 500,000 SNPs. HapMap and HGDP samples are available online, although the latter are presented in a somewhat different way than the 23andme raw data, which is a real pain because it takes a lot of work to overcome. Also, there are other settings you can try to see how they affect the results, like turning on LOCPRIOR, which tells Structure the putative origins of the samples. You can use different data formats too, examples of which are shown on the Structure home page.
Roman Kosoy et al., Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America, Human Mutation 2009,
Volume 30 Issue 1, Pages 69 - 78, doi: 10.1002/humu.20822
Hubisz M. J., Falush D., Stephens M., Pritchard J. K., Inferring weak population structure with the assistance of sample group information, Molecular Ecology Resources 2009. DOI: 10.1111/j.1755-0998.2009.02591.x
We've known for a while that variation within the EDAR gene causes straight, coarse hair in East Asians. Now, thanks to a new study, we know that the Trichohyalin gene (TCHH) has a similar effect in Europeans.
Hair morphology is highly differentiated between populations and among people of European ancestry. Whereas hair morphology in East Asian populations has been studied extensively, relatively little is known about the genetics of this trait in Europeans. We performed a genome-wide association scan for hair morphology (straight, wavy, curly) in three Australian samples of European descent. All three samples showed evidence of association implicating the Trichohyalin gene (TCHH), which is expressed in the developing inner root sheath of the hair follicle, and explaining ∼6% of variance (p = 1.5 × 10−31). These variants are at their highest frequency in Northern Europeans, paralleling the distribution of the straight-hair EDAR variant in Asian populations.
Sarah E. Medland et al., Common Variants in the Trichohyalin Gene Are Associated with Straight Hair in Europeans, The American Journal of Human Genetics, 05 November 2009, doi:10.1016/j.ajhg.2009.10.009
I've just posted an entry on my other blog about an article on the freshly discovered R1a1a7, defined by the M458 marker. No doubt, this is major news for R1a enthusiasts. Unfortunately, the authors of the paper seemed to have grossly overestimated the age of this lineage, but I'm hopeful this will be corrected in the near future. See here.
These are only based on 4965 single-nucleotide polymorphisms (SNPs), so they're not as detailed as some of the others I've posted. However, they come from an interesting study about links between intra-European biogeographic origins and certain systemic lupus erythematosus (SLE) manifestations. EEUR here stands for Eastern European, and probably includes samples of Polish origin (I'll have to check that).

Compared with Northern European ancestry, Southern European ancestry was associated with autoantibody production (odds ratio (OR)=1.40, 95% confidence interval (CI) 1.07–1.83) and renal involvement (OR 1.41, 95% CI 1.06–1.87), and was protective for discoid rash (OR=0.51, 95% CI 0.32–0.82) and photosensitivity (OR=0.74, 95% CI 0.56–0.97). Both serositis (OR=1.46, 95% CI 1.12–1.89) and autoantibody production (OR=1.38, 95% CI 1.06–1.80) were associated with Western compared to Eastern European ancestry. Ashkenazi Jewish ancestry was protective against neurologic manifestations of SLE (OR=0.62, 95% CI 0.40–0.94). Homogeneous clusters of cases defined by multiple PCs demonstrated stronger phenotypic associations.
I B Richman et al., European population substructure correlates with systemic lupus erythematosus endophenotypes in North Americans of European descent, Genes and Immunity advance online publication 22 October 2009; doi: 10.1038/gene.2009.80
I'm yet to see a really detailed study of North and East Eurasian influence in Eastern Europe. A paper published this week in BMC Genetics focuses on Uralic admixture in Russia and Belarus, but uses the DRD2 locus for the job, instead of the hundreds of thousands of SNPs I'd prefer. However, it seems like a pretty decent effort, and probably a fairly accurate guide of what's to come in the future when more comprehensive studies are done on the topic. It certainly fits with what I've noticed myself when sharing data with clients from 23andme and deCODEme. Indeed, I'm hoping to put together a blog entry about Eurasian genetic substructure based on some Structure runs I'm doing at the moment with samples from the HGDP and HapMap.
Populations in the northwestern (Byelorussians 2 from Mjadel’), northern (Russians from Mezen’ and 6 from Oshevensk; Komi 3), and eastern parts (Russians 4 from Puchezh and Chuvash) of the East European Plain have relatively high frequencies of haplotype B2-D2-A2, which may reflect admixture with Uralic-speaking populations.Uralic genetic substratum in these regions, which were inhabited by Uralic-speaking tribes as late as the Early Middle Ages, was also shown by studies in which other genetic markers were used (mtDNA, Y-chromosome, and autosomal).


Olga V Flegontova et al., Haplotype frequencies at the DRD2 locus in populations of the East European Plain, BMC Genetics 2009, 10:62doi:10.1186/1471-2156-10-62
Apparently, modern Swedes don't have as much in common with their Scandinavian hunter-gatherer predecessors as the Lithuanians and Latvians do from across the pond, according to a new study on ancient mtDNA from Gotland anyway....
The driving force behind the transition from a foraging to a farming lifestyle in prehistoric Europe (Neolithization) has been debated for more than a century [1,2,3]. Of particular interest is whether population replacement or cultural exchange was responsible [3,4,5]. Scandinavia holds a unique place in this debate, for it maintained one of the last major hunter-gatherer complexes in Neolithic Europe, the Pitted Ware culture [6]. Intriguingly, these late hunter-gatherers existed in parallel to early farmers for more than a millennium before they vanished some 4,000 years ago [7,8]. The prolonged coexistence of the two cultures in Scandinavia has been cited as an argument against population replacement between the Mesolithic and the present [7,8]. Through analysis of DNA extracted from ancient Scandinavian human remains, we show that people of the Pitted Ware culture were not the direct ancestors of modern Scandinavians (including the Saami people of northern Scandinavia) but are more closely related to contemporary populations of the eastern Baltic region. Our findings support hypotheses arising from archaeological analyses that propose a Neolithic or post-Neolithic population replacement in Scandinavia [7]. Furthermore, our data are consistent with the view that the eastern Baltic represents a genetic refugia for some of the European hunter-gatherer populations.
Helena Malmström et al., Ancient DNA Reveals Lack of Continuity between Neolithic Hunter-Gatherers and Contemporary Scandinavians, Current Biology, 24 September 2009, doi:10.1016/j.cub.2009.09.017
FYI, the "Pitted Ware" hunter-gatherer samples mostly carried U4, U5 and U5a, while the " Funnel Beaker" or "TRB" farmers sported H, J and T (only three samples). For a related story on mtDNA discontinuity in Central and Eastern Europe see here. By the way, the comments above about the eastern Baltic as a refugia for hunter-gatherers are interesting. But the argument isn't particularly strong yet, considering the small samples and lack of paternal lineages and autosomal DNA in the analysis.