search this blog

Saturday, November 12, 2011

Origins of R1a1a in or near Europe (aka. R1a1a out of India theory looks like a dud)

Ten years ago Passarino et al. released a paper focusing on the origins and spread of R1a1a (back then known as Eu19). They did this by studying the frequency and diversity of the 49a,f/TaqI haplotype 11, which appeared to be linked to R1a1a. The conclusion was that R1a1a most likely originated in present day Ukraine, and expanded from there into Europe and Asia.

However, a couple years later, STR diversity became the method of choice for studying Y-DNA haplogroup origins and expansions, and the information provided by 49a,f/TaqI Ht11 was basically ignored.

Despite lots of quirky results since then, like placing the ancestors of some modern populations far in Northern Europe when it was still covered with massive ice sheets, no one in academia attempted to challenge the new methodology until this year (see here). However, in the meantime, it was "discovered" that India harbored the greatest diversity in R1a1a STRs, and was thus hailed as the place of origin of this widespread paternal marker.

It seems we've now come full circle, because latest work on the SNP structure within R1a1a shows that India has very low R1a1a diversity. For instance, all Indians tested to date for newly discovered R1a1a SNPs, mostly as part of various private Y-DNA projects, have come back positive for the Z93 mutation. This marker is not upstream to any European R1a1a subclades. In fact, most Eastern Europeans tested to date have come back ancestral for Z93. This information gels very well with ancient DNA results, which show a movement of light-pigmented European-like groups deep into Asia during the early metal ages from somewhere in West Eurasia (see here).

The news just in, courtesy of the R1a and Subclades Y-DNA Project, is that the
Z283 SNP ties together the three major European R1a1a subclades. These are R1a1a1-Z284, largely found in Scandinavia, R1a1a1-M458, characteristic of Western Slavic and Eastern German populations, and R1a1a1-Z280, of Central and Eastern Europe. The primary distribution of Z283 shows an uncanny resemblance to that of the former Corded Ware cultural horizon of Northern Europe. Below is a map of the Corded Ware zone from Haak et al. 2008, which describes the discovery of R1a1a in the ancient remains from a Corded Ware burial in what is now Eastern Germany.


Passarino et al., The 49a,f haplotype 11 is a new marker of the EU19 lineage that traces migrations from northern regions of the black sea, Human Immunology, Volume 62, Issue 11, November 2001, Pages 1313-1314

FTDNA R1a and Subclades Y-DNA Project

Haak et al., Ancient DNA, Strontium isotopes, and osteological analyses shed light on social and kinship organization of the Later Stone Age, PNAS November 25, 2008 vol. 105 no. 47 18226-18231

Monday, October 17, 2011

Pigmentation genetics of Europeans

The maps below are based on three genome-wide SNPs showing high correlation with blue and green eyes and/or fair hair in Europeans. The results obviously suggest that there's an increase in hair and eye blondism from south to north in Europe, with clear peaks east of the Baltic Sea. The three SNPs are rs1667394 (HERC2 gene), rs12913832 (OCA2 gene), and rs12896399 (SLC24A4 gene).

Indeed, I'm a bit taken aback by the very high rate of suggested blondism among the Belorussians and Mordvinians (second highest dot in Russia). This might have something to do with sampling bias. Perhaps most of the 12 Belorussians and 16 Mordvinians used here came from fairer than average communities within their respective nations? I have no idea. In any case, I don't think the picture is too far from reality, because multiple sampling sites from the same general biogeographic zones, but several hundred kilometers apart, are showing very similar results. This can't be a coincidence.

It's also interesting to note that the East Baltic peak in blondism genotypes correlates closely with the North + East European genome-wide ancestral component in my latest ADMIXTURE experiment (see here). Perhaps this is where natural selection for these traits was most extreme due to very specific environmental pressures, like lack of sunlight? Maybe this is also where these traits spread from, either gradually or during one or several major migrations? Someone should look into that. Meantime, I'll try and update this post with new maps as more samples come in.

Friday, October 7, 2011

European admixture among ancient East Asians (two-rooted canines carried by early Indo-Europeans to China)

Two-rooted lower canines are rare in humans, but they are most commonly found among Europeans, at levels of up to 9%. A new study reveals that this trait reached unusually high frequencies in ancient groups from East Central and East Asia, particularly those of Afanasevo, Scythian, Uighur and Ordos origin (2.8% to 4%). This is a strong indication that such groups carried significant European ancestry, and were possibly the descendants of the same European migrants who took R1a1a and Indo-European culture deep into Asia after the Neolithic (see here).

In Table 1, the population variation of two-rooted lower canines is shown for major populations of the world. To emphasize the point that this is a European trait, of the 12,128 individuals included in the table, only 306 express two-rooted lower canines (2.5%) but of these 83% (254) were Europeans. If you include related Asiatic Indian, Middle Eastern, and North African populations, this number increases to 89% (272/306).


The presence of the two-rooted canines in East Asia may provide some clue as to the eastward migration of new populations into China and Mongolia. The largest numbers of individuals with this trait are concentrated along the western and northern frontiers of China and Mongolia. Archaeological excavations support the large scale movement of people into this area during the Bronze age (ca. 2200 BCE–400 BCE). Burial artifacts and settlement patterns suggest cultural and technological ties to the Afanasevo culture in Siberia, which in turn is linked archaeologically, linguistically, and genetically with the Indo-European Tocharian populations that appear to have migrated to the Tarim Basin ca. 4,000 years ago (Ma and Sun, 1992; Ma and Wang, 1992; Mallory and Mair, 2000; Romgard, 2008; Keyser et al., 2009; Li et al., 2010).

Christine Lee and G. Richard Scott, Brief Communication: Two-Rooted Lower Canines - A European Trait and Sensitive Indicator of Admixture Across Eurasia, American Journal of Physical Anthropology (2011), DOI: 10.1002/ajpa.21585

Tuesday, March 22, 2011

Reconstructing the Ancestral North Indian (ANI) genome

Back in 2009, Reich et al. theorized that the current South Asian gene pool was basically made up of two founding genetic components; Ancestral North Indian (ANI), and Ancestral South Indian (ASI). The distilled ANI, they noted, was more similar to the genomes of modern Northwest Europeans than those of the Adygei from the Caucasus. This is obviously out of whack with geography, but it does make sense based on what I've seen in my experiments on the Pakistani samples from the HGDP. Many of them, especially the Pathans, carry numerous segments, or haploblocks, that basically look North European. This gave me an idea to try and reconstruct the ANI genome based on such fragments. The first chromosome of my composite sample, which I call the "ANI composite" is available for download here. It's a PLINK Ped file in illumina AB format with 19,261 SNPs.

Below are several PCA plots featuring the "ANI composite", obviously not including the HGDP samples used to make it (see below). Overall, it seems to resemble most closely my reference samples from Eastern Europe. I have to admit that I was very pleased to see it behaving like a set of genotypes from a real human subject across many dimensions of genetic variation. PCA are very sensitive to anomalies, such as unusually long runs of homozygosity, so the fact that my composite can pass for a normal sample on these plots is fantastic.

So how did I do this? Well, it wasn't very difficult, but a bit tedious, so I need a break before continuing. I used information from my earlier experiments with ADMIXMAP, HAPMIX and RHH Counter to locate and delineate North European-like segments in phased Pakistani HGDP samples. I phased the data myself with BEAGLE, in a pool of South Asian and Middle Eastern samples, so as not to bias the results of phasing and imputation towards Northern Europe. In order to keep the alleles in phase when loaded into PLINK, I duplicated the haplotypes, producing completely homozygous individuals out of each one. Then I created an ANI composite dummy with 100% no calls, and loaded the haplotypes into this sample with a Python script. The first to load were the Pathan haplotypes, followed by the Burusho. I chose individuals from these two groups to make up the backbone of the putative ANI genome because they always seem to come out most "North European" in my ADMIXTURE and PCA/MDS runs compared to other South Asians. The empty spaces were filled with haplotypes from the Brahui and Balochi. Below is a list of all the samples used:

Pathan HGDP00213
Pathan HGDP00214
Pathan HGDP00218
Pathan HGDP00224
Pathan HGDP00241
Pathan HGDP00243
Pathan HGDP00254
Pathan HGDP00258
Pathan HGDP00259
Pathan HGDP00262
Pathan HGDP00264

Burusho HGDP00338
Burusho HGDP00356
Burusho HGDP00364
Burusho HGDP00382
Burusho HGDP00392
Burusho HGDP00412
Burusho HGDP00417
Burusho HGDP00423
Burusho HGDP00428
Burusho HGDP00433

Brahui HGDP00007
Brahui HGDP00009
Brahui HGDP00017
Brahui HGDP00041
Brahui HGDP00047

Balochi HGDP00054
Balochi HGDP00058
Balochi HGDP00062
Balochi HGDP00072

The phased data and the "ANI" haplotypes used in this experiment are available on request from eurogenesblog [at] hotmail [dot] com. I welcome feedback and suggestions on how to improve my methodology. Admittedly, this was a test run, so it's unlikely to be perfect.