Eurogenes Blog

Tuesday, May 13, 2014

PCA projection bias in ancient DNA studies

Many Principal Component Analyses (PCA) in papers on ancient genomes clearly suffer from projection bias. However, most people don't seem to understand this problem and the impact it can have on the interpretation of the data.

Here's a demonstration of this effect using two PCA. In the first PCA, La Brana-1, a Mesolithic genome from Iberia, was projected onto the PC eigenvectors computed with modern individuals from the HGDP. However, in the second PCA the ancient genome was run together with these samples. Note the clear difference between the two outcomes.

The second outcome does look a bit strange, but it's actually the correct one, because it's now an established fact that Mesolithic hunter-gatherers, like La Brana-1, were clearly outside the range of modern European, and indeed West Eurasian, genetic variation.

For a technical discussion of this problem, which is also sometimes known as "shrinkage", refer to Lee et al. 2012. To get an idea of the confusion that it can cause, see the discussion in the comments section under my last blog post:

More info on two Thracian genomes from Iron Age Bulgaria + a complaint

The above experiment with La Brana-1 was run with PLINK 2, which is freely available here, using just over 16K SNPs. Only markers with a read depth of 4x or higher were considered, and the marker set was further pruned to account for no-calls (--geno 0.005), LD (--indep-pairwise 200 25 0.4), and minor allele frequency (--maf 0.05).

Friday, May 9, 2014

More info on two Thracian genomes from Iron Age Bulgaria + a complaint

PLoS Genetics has just published a new paper on the genetic affinities of Oetzi the Iceman (see here). As far as I can tell, it simply affirms what we've already learned about Oetzi from previous studies, but it does feature interesting new insights into a couple of genomes from Iron Age Bulgaria, aka. Thrace:

The first individual (P192-1) was excavated from a pit sanctuary near Svilengrad, Bulgaria, dated to 800–500 BCE. The other individual (K8) was found in the Yakimova Mogila Tumulus in southeastern Bulgaria, dated to 450–400 BCE.

...

For the Thracian individuals from Bulgaria, no clear pattern emerges. While P192-1 still shows the highest proportion of Sardinian ancestry, K8 more resembles the HG individuals, with a high fraction of Russian ancestry.

...

Interestingly, this individual [K8] was excavated from an aristocratic inhumation burial containing rich grave goods, indicating a high social standing, as opposed to the other individual, who was found in a pit [15]. However, the DNA damage pattern of this individual does not appear to be typical of ancient samples (Table S4 in [15]), indicating a potentially higher level of modern DNA contamination.

K8 might well be contaminated with modern DNA to some degree, but I'd say there's a much better explanation for these signals of non-trivial genetic substructures within the Thracian population.

Archeology suggests that during the Bronze Age the Balkans were invaded from the east by nomads associated with the Yamnaya culture of the Pontic-Caspian Steppe. These invaders, possibly of early Indo-European stock, liked to build Tumuli mounds for their important dead, which were essentially copies of the Kurgan mounds built by the Yamnaya and related peoples.

Moreover, we now know that indigenous European hunter-gatherer (HG) ancestry survived best in Eastern Europe (see here), so it's very likely that the aforementioned invaders from the steppe were significantly HG-like in terms of genetic structure.

Therefore, the fact that K8 was buried in a richly furnished Tumulus (essentially a Kurgan), and genetically more similar to indigenous Europeans than P192-1, who was genetically more Near Eastern-like, and basically thrown into a ditch after he died, doesn't appear to be a coincidence.

In other words, perhaps K8 belonged to a ruling class of steppe origin, while P192-1 was largely of native Balkan stock, whose ancestors were conquered centuries earlier by the steppe nomads and forced to live as an underclass? If so, it wouldn't be the only time in history that this sort of thing has happened, especially within Indo-European societies.

By the way, unfortunately I have to add that the Principal Component Analyses (PCA) in this paper featuring the two HG genomes, ajv70 and La Brana-1, are simply woeful (PDF link). These genomes should be clearly outside the range of modern European genetic variation, but here they land among the Orcadian and French samples. Where was the peer review I wonder?

Citation...

Sikora M, Carpenter ML, Moreno-Estrada A, Henn BM, Underhill PA, et al. (2014) Population Genomic Analysis of Ancient and Modern Genomes Yields New Insights into the Genetic Ancestry of the Tyrolean Iceman and the Genetic Structure of Europe. PLoS Genet 10(5): e1004353. doi:10.1371/journal.pgen.1004353

See also...

Ancient DNA from prehistoric Bulgaria and Denmark

PCA projection bias in ancient DNA studies

Thursday, April 3, 2014

The really old Europe is mostly in Eastern Europe

A new version of the Lazaridis et al. ancient genomes preprint has just appeared at arXiv (see here). It includes several new Principal Component Analyses (PCA), TreeMix graphs, a ChromoPainter/fineSTRUCTURE co-ancestry matrix, and an updated ADMIXTURE analysis. The revised text underlines the relatively close genetic relationship between indigenous European hunter-gatherers and present-day Eastern Europeans:

The co-ancestry matrix (Fig. S19.3) confirms the ability of this method to meaningfully cluster individuals. We highlight two clusters: Stuttgart joins all Sardinian individuals in cluster A and Loschbour joins a cluster B that encompasses all Belarusian, Ukrainian, Mordovian, Russian, Estonian, Finnish, and Lithuanian individuals. These results confirm Sardinia as a refuge area where ancestry related to Early European Farmers has been best preserved, and also the greater persistence of WHG-related ancestry in present-day Eastern European populations. The latter finding suggests that West European Hunter-Gatherers (so-named because of the prevalence of Loschbour and La Braña) or populations related to them have contributed to the ancestry of present-day Eastern European groups. Additional research is needed to determine the distribution of WHG-related populations in ancient Europe.

Fig. S10.5 suggests that the main axis of differentiation in Europe when the subcontinent is considered as a whole may tend to Northeastern Europe rather than SSE/NNW (8). This is consistent with our analysis of ancestry proportions in European populations (Fig. 2B, Extended Data Table 3) which indicate a cline of reduced EEF (and increasing WHG) ancestry along that direction.

Citation...

Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, arXiv, April 2, 2014, arXiv:1312.6639v2

Monday, March 10, 2014

Extreme positive selection for light skin, hair and eyes on the Pontic-Caspian steppe...or not

Unusually strong positive selection over the past 5,000 years, rather than population replacement or even admixture, is responsible for the high frequencies of light skin, hair and eyes among present-day Eastern Europeans, according to a new paper by Wilde et al. at PNAS.

The authors were able to infer pigmentation traits from ancient DNA for 63 Eneolithic and Bronze Age samples, mostly from Kurgan mounds from the Pontic-Caspian steppe of Ukraine and surrounds. The results suggest that the ancient individuals were overall much darker than present-day Ukrainians, who, nevertheless, appear to be their direct descendants based on mitochondrial DNA (mtDNA) sequences. Quoting the paper:

To this end we compared the 60 mtDNA HVR1 sequences obtained from our ancient sample to 246 homologous modern sequences (29–31) from the same geographic region and found low genetic differentiation (FST = 0.00551; P = 0.0663) (32). Coalescent simulations based on the mtDNA data, accommodating uncertainty in the ancient sample age, failed to reject population continuity under a wide range of assumed ancestral population size combinations (Fig. 1).

Conversely, continuity between early central European farmers and modern Europeans has been rejected in a previous study (33). However, the Eneolithic and Bronze Age sequences presented here are ∼500–2,000 y younger than the early Neolithic and belong to lineages identified both in early farmers and late hunter–gatherers from central Europe (33).

...

In sum, a combination of selective pressures associated with living in northern latitudes, the adoption of an agriculturalist diet, and assortative mating may sufficiently explain the observed change from a darker phenotype during the Eneolithic/Early Bronze age to a generally lighter one in modern Eastern Europeans, although other selective factors cannot be discounted. The selection coefficients inferred directly from serially sampled data at these pigmentation loci range from 2 to 10% and are among the strongest signals of recent selection in humans.

Well, either this is indeed a remarkable finding, or something's not quite right. I think it's the latter.

The argument for genetic continuity from the Eneolithic/Bronze Age to the present on the Pontic-Caspian steppe based on mtDNA sequences is actually very weak. The results could simply mean that the ancient samples shared deep maternal ancestry with modern Ukrainians and most other Europeans.

Indeed, we know for a fact that much of the Pontic-Caspian steppe was occupied by Turkic groups of Asian origin from the early Middle Ages until only a couple of hundred years ago. They were eventually cleared out by Tsarist Russia, and mainly replaced by East Slavic settlers from just northwest of the steppe. This process might not be easy to see by comparing low resolution mtDNA data, even between European populations separated by 5,000 years, but it's likely to be obvious when looking at full mtDNA genomes, high-density genome-wide data, and/or Y-chromosome haplogroups.

Surprisingly, the article doesn't mention Keyser et al. 2009, a very important study which showed that a sample of Kurgan nomads from Bronze and Iron Age South Siberia had frequencies of light hair and eyes comparable to those of present-day Northern and Eastern Europeans (see here). Also worth noting is that the most common Y-chromosome haplogroup among these individuals was R1a, which is today the most frequent haplogroup in Eastern Europe, including Ukraine.

What this suggests to me is that the Kurgan cultural horizon was not genetically homogeneous. I suspect that Kurgan groups closer to the Balkans carried significantly higher levels of Near Eastern Neolithic farmer ancestry, and were thus much darker than those in the more temperate northerly regions. However, it seems that at some point, the Neolithic farmer DNA was diluted enough by continuous movements of light pigmented groups from the north and east, possibly made up mostly of males, that there was a major shift in pigmentation traits from Near Eastern-like to North European-like across most of Eastern Europe. This scenario actually fits very nicely with the latest on the genetic origins of Europeans (see here).

We won't know what really happened until we see at least a few complete ancient genomes from Eastern Europe. But for now, I'd have to suspend my disbelief to accept that present-day Eastern Europeans are, by and large, descendants of these exceedingly brunet prehistoric people of the Pontic-Caspian steppe.

Citation...

Wilde et al., Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y, PNAS, Published online before print on March 10, 2014, DO:I10.1073/pnas.1316513111

See also...

PCA of ancient European mtDNA

Thursday, February 27, 2014

Khazar shmazar

Human Biology recently posted several open access manuscripts dealing with the topic of Jewish origins (see submissions from 2013 here). One of these preprints is essentially a rebuttal to an Eran Elhaik paper from a couple of years ago, which argued that a substantial part of Ashkenazi Jewish ancestry was derived from within the Khazar Empire. The leading author of the new preprint is Doron M. Behar, but thirty people in all, many of them well known scientists, have put their names on it. Here's the abstract:

The origin and history of the Ashkenazi Jewish population have long been of great interest, and advances in high-throughput genetic analysis have recently provided a new approach for investigating these topics. We and others have argued on the basis of genome-wide data that the Ashkenazi Jewish population derives its ancestry from a combination of sources tracing to both Europe and the Middle East. It has been claimed, however, through a reanalysis of some of our data, that a large part of the ancestry of the Ashkenazi population originates with the Khazars, a Turkic-speaking group that lived to the north of the Caucasus region ~1,000 years ago. Because the Khazar population has left no obvious modern descendants that could enable a clear test for a contribution to Ashkenazi Jewish ancestry, the Khazar hypothesis has been difficult to examine using genetics. Furthermore, because only limited genetic data have been available from the Caucasus region, and because these data have been concentrated in populations that are genetically close to populations from the Middle East, the attribution of any signal of Ashkenazi-Caucasus genetic similarity to Khazar ancestry rather than shared ancestral Middle Eastern ancestry has been problematic. Here, through integration of genotypes on newly collected samples with data from several of our past studies, we have assembled the largest data set available to date for assessment of Ashkenazi Jewish genetic origins. This data set contains genome-wide single-nucleotide polymorphisms in 1,774 samples from 106 Jewish and non- Jewish populations that span the possible regions of potential Ashkenazi ancestry: Europe, the Middle East, and the region historically associated with the Khazar Khaganate. The data set includes 261 samples from 15 populations from the Caucasus region and the region directly to its north, samples that have not previously been included alongside Ashkenazi Jewish samples in genomic studies. Employing a variety of standard techniques for the analysis of populationgenetic structure, we find that Ashkenazi Jews share the greatest genetic ancestry with other Jewish populations, and among non-Jewish populations, with groups from Europe and the Middle East. No particular similarity of Ashkenazi Jews with populations from the Caucasus is evident, particularly with the populations that most closely represent the Khazar region. Thus, analysis of Ashkenazi Jews together with a large sample from the region of the Khazar Khaganate corroborates the earlier results that Ashkenazi Jews derive their ancestry primarily from populations of the Middle East and Europe, that they possess considerable shared ancestry with other Jewish populations, and that there is no indication of a significant genetic contribution either from within or from north of the Caucasus region.

I'm really not sure what to make of all of this attention that the Khazar hypothesis is still getting? It's been obvious for a while now that in terms of genetic structure Ashkenazi Jews are basically a group of East Mediterranean origin. But Elhaik's paper did get a fair bit of media coverage, so I suppose after that a rebuttal was to be expected.

In any case, I'm not complaining. This paper includes a very interesting genotype dataset of many previously unpublished samples, which I tested last week with PCA (see here).

Citations...

Behar, Doron M.; Metspalu, Mait; Baran, Yael; Kopelman, Naama M.; Yunusbayev, Bayazit; Gladstein, Ariella; Tzur, Shay; Sahakyan, Havhannes; Bahmanimehr, Ardeshir; Yepiskoposyan, Levon; Tambets, Kristiina; Khusnutdinova, Elza K.; Kusniarevich, Aljona; Balanovsky, Oleg; Balanovsky, Elena; Kovacevic, Lejla; Marjanovic, Damir; Mihailov, Evelin; Kouvatsi, Anastasia; Traintaphyllidis, Costas; King, Roy J.; Semino, Ornella; Torroni, Anotonio; Hammer, Michael F.; Metspalu, Ene; Skorecki, Karl; Rosset, Saharon; Halperin, Eran; Villems, Richard; and Rosenberg, Noah A., No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews (2013). Human Biology Open Access Pre-Prints. Paper 41.

Elhaik E. The missing link of Jewish European Ancestry: contrasting the Rhineland and Khazarian hypotheses. Genome Biol Evol. 2012. doi:10.1093/gbe/evs119, Advance Access publication December 14, 2012.

See also...

Near Eastern origin of Ashkenazi Levite R1a

Monday, January 27, 2014

A Mesolithic genome from Spain

Nature today published a paper on the complete genome of La Brana 1, a Mesolithic hunter-gatherer from Iberia: Olalde et al. 2014. Based on genetic variants associated with pigmentation traits, it's likely that this individual had blue eyes, dark hair and deep brown skin.

Moreover, he was probably lactose intolerant (in other words, unlike most Europeans today, he couldn't drink milk as an adult), and his Y-chromosome belonged to the European-specific, but today extremely rare, haplogroup C6 (aka. C-V20), and mtDNA to haplogroup U5b2c1, which again is a European-specific marker. Below is an artist's impression of his mug (courtesy of CSIC), and below that the paper abstract.

Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe1, 2, 3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer.

Indeed, the pigmentation traits are basically the same as those of Loschbour, a Mesolithic genome from Luxembourg, featured recently in the groundbreaking Lazaridis et al. preprint (see here). So we can already speculate with some confidence that this was a common, and perhaps dominant, trait combination among European hunter-gatherers.

However, early European farmers, whose ancestors almost certainly migrated to Europe from the Near East during the Neolithic, probably had somewhat different pigmentation traits. We know this because a 7,500 year-old Linearbandkeramik (LBK) farmer genome from Stuttgart, Germany, also featured in Lazaridis et al., showed markers for brown eyes, dark hair, and relatively light skin.

So as things stand, it appears that Europeans only acquired their present coloring, including pale skin and a high incidence of light eyes, relatively recently, well after the hunter-gatherers and farmers began mixing, and their hybrid DNA had time to go through some really powerful selective sweeps. These sweeps were possibly in part a reaction to the Neolithic diet, rich in carbohydrates but poor in vitamin D, amongst other things. Vitamin D doesn't have to be acquired from food because the body can synthesize it from the sun, but this is done more effectively by people with fair skin, giving them an advantage, especially in places like Europe, which has fairly long winters and lots of cloud cover.

But perhaps this isn't the full story, and present-day European pigmentation traits are also sourced from a late migration into Europe of a prevailingly blond people from somewhere in what is now Russia?

This might sound far fetched, but during the middle Bronze Age the Eurasian steppe was home to the Andronovo culture, with archeological links to earlier cultures in what is now southern Russia. Based on the DNA of Andronovo nomads from Kurgans in South Siberia, it seems they had fair skin and a lot of blue eyes and blond hair (see here). They also overwhelmingly belonged to Y-chromosome haplogroup R1a1a, which is very common today in Central and Eastern Europe and also parts of Scandinavia. So it'll be interesting to see the pigmentation markers of Mesolithic Eastern Europeans and Central Asians when their genomes become available, probably in the not too distant future, and if they contributed any ancestry to present-day Europeans. Early indications are that they did, and I discussed that in my previous blog entry here.

La Brana 1 and Loschbour were both classified as part of the West European Hunter-Gatherer (WHG) mata-population by Lazaridis et al., even though only a partial sequence from La Brana 1 was available at the time. As far as I can see, the results in Olalde et al. based on the complete genome don't contradict this classification, because they show that La Brana 1 is most similar to present-day Europeans from around the Baltic Sea, just like Loschbour. Note, for instance, the position of Swedes (SE) and Poles (PL) on the far right of these graphs, indicating inflated allele sharing between them and La Brana 1 relative to other Europeans.

Unfortunately, I have to say that the main Principal Component Analysis (PCA) from the paper isn't as informative as it could have been, due to the large number of Finnish individuals included in the analysis. It's mostly a reflection of the recent population growth, founder effect and genetic drift among Finns, particularly those from eastern Finland.

Nevertheless, note that all of the non-Finnish Europeans more or less fall along the cline that runs from La Brana 1 to present-day Cypriots. This suggests that Europeans today are mostly the product of mixture, in varying degrees, between indigenous European hunter-gatherers, like La Brana 1 and Loschbour, and immigrant Neolithic farmers from the East Mediterranean. So it's a result that basically agrees with the findings of Lazaridis et al.

Interestingly, Loschbour and four other Mesolithic samples from Lazaridis et al. belonged to Y-chromosome haplogroup I, which is not at all closely related to C6. This hints at the presence of a diverse Y-chromosome gene pool in pre-Neolithic Europe, and indeed I'm still confident of seeing R1 and/or R1a among Mesolithic remains from Eastern Europe.

Even though the vast majority of haplogroup C clades are today specific to Eastern Asia, Oceania and the Americas, C6 has only been found among a handful of individuals from across Southern, Western and Central Europe, many of whom are listed at the FTDNA haplogroup C project (look for the V20+ results here). It's difficult to say when this marker or its ancestral lineage migrated to Europe, but C is one of the most basal human Y-chromosome clades, so it could represent the very first Anatomically Modern Human (AMH) wave into Europe, which actually isn't a new concept (see Scozzari et al. 2012).

The Olalde et al. paper includes a lot more information than I'm willing to cover in this blog entry. If you don't have access to the main report, please note that the extended and supplementary data are very detailed and open access.

Citation...

Olalde et al., Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European, Nature (2014), doi:10.1038/nature12960

Wednesday, January 8, 2014

Another look at the Lazaridis et al. ancient genomes preprint

I've now had a chance to look over the Lazaridis et al. preprint a few times, and also take part in several online discussions about the results, at these blogs and elsewhere. So I thought it might be useful to put together another post on the paper to report what I've learned and reiterate a few points. First of all, to understand the results, it's really important to known what the four main ancestral components in this study represent:

- West European Hunter-Gatherer (WHG), based on an 8,000 year-old genome from Loschbour, Luxembourg

- Ancient North Eurasian (ANE), based on a 24,000 year-old genome from South Siberia (dubbed Mal'ta boy or MA-1)

- Early European Farmer (EEF), based on a 7,500 year-old genome from Stuttgart, Germany, belonging to the Neolithic Linearbandkeramik (LBK) culture

- Eastern non-African (ENA), this basically means East Eurasian, and is based on samples of present-day Onge, Han Chinese and Atayal from Taiwan

Now, from what I've seen online, many people seem to think that ANE is more East Asian than European, and can be considered a signal of pretty much any population expansion from the east into Europe. This is not true. ANE is Amerindian-like, but actually also very similar to WHG. In fact, they're equidistant from ENA:

The results of Table S12.1 provide suggestive evidence that Onge share more common ancestry with hunter-gatherers than with Stuttgart. All statistics involving two hunter-gatherer populations have |Z|<0.9, so ancient Eurasian hunter-gatherers are approximately symmetrically related to Onge, and they are all more closely related to them than is Stuttgart.

We next consider the relationship of ancient samples to East Asia using the set (Ami, Atayal, Han, Naxi, She). East Asians are more closely related to all hunter-gatherers than to Stuttgart, but there are no significant differences between hunter-gatherers (all such statistics have |Z|<1.1) (Table S12.2).

...

We have conveniently labeled MA1-related ancestry “Ancient North Eurasian” because of the provenance of MA1 in Siberia, but at present we cannot be sure whether this type of ancestry originated there or was a recent migrant from some western region.

The various Uralic, Turkic and Mongolian groups expanding into Europe, usually after the Bronze Age, no doubt carried significant ENA, so these groups can't be the source of the fairly high levels of ANE across Europe today, because most Europeans lack ENA. Below is a graph based on two f4 tests, comparing ANE and ENA ancestry among Europeans, this time with the Han Chinese as ENA proxies. Note that most of the samples fall within a cline that runs from the Stuttgart sample to Estonians. The only outliers in the direction of the Han are groups from current or former Uralic and Turkic speaking areas of Europe.

ANE was actually present in Scandinavia during the Mesolithic, because Motala12, the 8,000 year-old hunter-gatherer genome from Sweden, has an ANE ratio of 19%. But this isn't enough to explain the ANE levels carried by most present-day Europeans, so it's very likely there were at least two expansions of ANE into Europe.

Considering that Loschbour and Stuttgart totally lack ANE, it's plausible that a major wave of ANE moved across much of Europe sometime after the early Neolithic, but obviously before the Uralic and Turkic expansions, which, as per above, were rich in ENA. Based on recently published ancient mtDNA evidence from Central Europe (see here), Lazaridis et al. propose that this timeframe was the Copper and/or Bronze Age.

This of course is the generally accepted Proto-Indo-European timeframe. Indeed, the theory I put forward in the previous blog entry (see here) that most of the ANE in Europe today was the result of the Proto-Indo-European expansion, probably from Eastern Europe, looks even better on closer inspection.

Note the elongated cline formed by the European samples running from WHG to EEF on Fig 2B, shown below. It correlates well with latitude, and very likely reflects northward migrations of Neolithic farmers into Europe from the Mediterranean Basin, followed by isolation-by-distance. In other words, this cline probably took thousands of years to form.

On the other hand, there is no cline running from WHG/EEF to ANE, but all of the Indo-European and/or Eastern European samples are fairly evenly lifted up towards ANE relative to a few outliers. These outliers are all southwestern Europeans: Basques, Pais Vasco (Basque Country) Spaniards, southern French and Sardinians.

Of course, southwestern Europe is the most distant part of the continent from the generally accepted Indo-European homeland near the middle Volga. Moreover, Basques don't speak an Indo-European language, while Sardinians were only Indo-Europeanized during historic times.

Indeed, even though a couple of tables in the study report considerable ANE ancestry among Basques and Pais Vasco Spaniards, the authors admit that this need not be the case. For instance:

We next attempted to fit individual West Eurasian populations as a mixture of Loschbour and Stuttgart, as representatives of Early European farmers and West European Hunter Gatherers.

Fig. 1B suggests that this is not possible, as most Europeans form a cline that cannot be reconciled with such a mixture [Davidski's note: I think they actually mean Fig. 2B]. Nonetheless, for Sardinians (Extended Data Table 1), the most negative f3-statistic is of the form f3(Test; Loschbour, Stuttgart), which suggests that at least some Europeans may be consistent with having been formed by such a mixture. We thus fit each European population into the topology of Fig. S12.6. Only Basques, Pais_Vasco, and Sardinians, can be fit successfully with this model. Fig. S12.8 shows a successful fit.

Most European populations cannot be fit as this type of 2-way mixture and, intuitively, this is due to their tendency (Fig. 1B) towards Ancient North Eurasians that is not modeled by such a mixture.

Another intriguing thing about the results shown in Fig 2B is that the expansions of ANE across Europe appear not to have disturbed the presumably Neolithic WHG/EEF cline to any great extent. What this suggests is that ANE was spread largely independently of EEF and even WHG. In other words, the groups that pushed ANE deep into Europe probably had very high ratios of this component. This also seems to be true for the groups that brought ANE to the Near East:

A geographically parsimonious hypothesis would be that a major component of present-day European ancestry was formed in eastern Europe or western Siberia where western and eastern hunter-gatherer groups could plausibly have intermixed. Motala12 has an estimated WHG/(WHG+ANE) ratio of 81% (S12.7), higher than that estimated for the population contributing to modern Europeans (Fig. S12.14). Motala and Mal’ta are separated by 5,000km in space and about 17 thousand years in time, leaving ample room for a genetically intermediate population. The lack of WHG ancestry in the Near East (Extended Data Fig. 6, Fig. 1B) together with the presence of ANE ancestry there (Table S12.12) suggests that the population who contributed ANE ancestry there may have lacked substantial amounts of WHG ancestry, and thus have a much lower (or even zero) WHG/(WHG+ANE) ratio.

So perhaps the 17,000 year-old Afontova Gora 2 (AG2) genome from Central Siberia, classified as part of the ANE meta-population by Lazaridis et al., is genetically the closest sample we have to the Proto-Indo-Europeans? Based on a couple of the PCA from Lazaridis et al. (below) and Raghavan et al. (see here), this genome doesn't appear to be 100% ANE. My very rough estimate is 85/15 ANE/WHG.

If my assumptions are correct here, then it's no wonder that this Bronze Age Danish sample (M4) from the recent Carpenter et al. paper (see here) shows a clear shift towards the Americans on the global PCA. M4 is better known as "the old man" from the giant Borum Eshøj barrow (see here), presumably built by some of the earliest Indo-Europeans in Scandinavia. We can probably expect such Afontova Gora 2-like results from many European samples archeologically linked to the early Indo-Europeans.

As for the first major expansion of ANE into Europe, here's an interesting map that I spotted in one of the online discussions on the paper, which shows the spread of microblade technology in almost all directions from around Lake Baikal just after the LGM (source). Among other things, it offers a very attractive explanation for the presence of ANE in Mesolithic Sweden, as well as the current distributions of Y-chromosome haplogroups R and Q (note that MA-1 belonged to R, which is the brother clade of Q).

But the problem with this scenario is the tight phylogenetic relationship between ANE and WHG. If the former expanded after the LGM from a refugium in South Siberia, then why is it so closely related to the latter, which presumably recolonized Europe from a Southern European LGM refugium, basically at the opposite end of Eurasia?

There also have been a lot of comments online about the potential correlations between ANE and certain clusters generated from modern samples with the ADMIXTURE software. I think it's obvious from just looking at the ADMIXTURE bar graph from Lazaridis et al. that ANE is linked in one way or another to the clusters that peak in Northeastern Europe, the North Caucasus, and South Central Asia (especially among the Indo-Iranian Kalash).

Below is the bar graph from the optimal ADMIXTURE run, the K=16. Note that ANE proxy MA-1 mostly shows membership in the cream and light blue clusters, which peak among the Kalash and Lithuanians, respectively. Click on the image to enlarge.

The Kalash-centered cluster, which actually first appears at K=14, and is more or less repeated in four runs, is particularly interesting, because it shows fairly similar distribution patterns to ANE. Note, for instance, that after South Central Asia it reaches its highest levels in the North Caucasus, which is where ANE also shows a major peak today (see here). Moreover, in Europe it's most pronounced in the east and north, but appears at comparatively trivial levels among the Basques, southern French and Pais Vasco Spaniards, and doesn't show up at all among Sardinians or the ancient European genomes.

However, it's often very difficult to make inferences about ancient population movements from ADMIXTURE results, and I think this is one of those cases. Just because this cluster peaks among the Kalash, doesn't mean that it has its origins within this group, or even in Asia. I'd say the most plausible explanation for its existence is that it represents ANE that expanded rapidly across Eurasia, probably during the early Indo-European dispersals, and today reaches its higher frequencies among some of the most isolated and genetically drifted recipients of this ANE gene flow (ie. those in the Caucasus and Hindu Kush).

By the way, the difference in ANE levels between southwestern Europeans and most other West Eurasians clearly shows on my own PCA and MDS maps. Below is the latest Eurogenes PCA of West Eurasia from a few months ago. Note the pronounced eastern shift among almost all the samples relative to the Basques, Pais Vasco Spaniards, and Sardinians. As per the f4 graph above, only in some instances is this shift also the result of significant ENA ancestry.

It's incredible what a few ancient genomes can add to the context of these sorts of analyses using modern DNA. I didn't really know what was causing this eastern shift when I posted the PCA, and guessed that it might simply be a lack of Mediterranean ancestry across Northern and Eastern Europe (see here).

I also just noticed that Razib posted two articles on the pigmentation traits of the ancient individuals (see here and here). The sample is tiny, but looking back, the fact that the Loschbour hunter-gatherer probably had blue eyes and dark skin, while, on the other hand, the Stuttgart farmer had relatively light skin, is actually quite remarkable.

We'll have a major story on our hands if several other hunter-gatherer genomes come back with similar results. It's just not something anyone would've predicted from modern DNA. Apart from that, there's also the slight shock factor of learning that our not too distant indigenous European ancestors were probably of a deep shade of brown. Imagine that, Europe might have only really lightened up and become white after Near Eastern migrants made their way over. Well, let's wait and see.

Citation...

Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, bioRxiv, Posted December 23, 2013, doi: 10.1101/001552

Raghavan et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, Nature, (2013), Published online 20 November 2013, doi:10.1038/nature12736

Carpenter et al., Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries, The American Journal of Human Genetics (2013), https://dx.doi.org/10.1016/j.ajhg.2013.10.002

See also...

Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans

The really old Europe is mostly in Eastern Europe

EEF-WHG-ANE test for Europeans

Mesolithic genome from Spain reveals markers for blue eyes, dark skin and Y-haplogroup C6

Monday, December 23, 2013

Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans

This new preprint at bioRxiv is quite the Christmas present for those of us with a passion for European genetics and prehistory. It's the first paper to report on full genomes from Mesolithic and Neolithic Europe.

All of the successfully tested Mesolithic Y-chromosomes, one from Luxembourg and four from Motala, Sweden, belonged to haplogroup I. This probably won't come as a surprise to many people, as this marker was always the main candidate for Europe's indigenous Y-haplogroup. However, three of the results fell into haplogroup I2a1b, and none into I1, which is today the most common Y-haplogroup in most of Scandinavia.

What this suggests is that I1 expanded after the Mesolithic and replaced most of the I2a1b across Northwestern Europe. I'd say these were mostly expansions from North-Central Europe, although recent chatter on the web suggests that two distinct I1 lineages might have arrived in North-Central Europe from Eastern Europe at different times.

All of the Mesolithic mtDNA sequences belonged to haplgroups U2 and U5, which is line with past results. The single Neolithic sample, from a 7500 year-old Linearbandkeramik (LBK) site in Stuttgart, Germany, belonged to mtDNA haplogroup T2. Again, not very surprising considering what we've seen to date.

The genome-wide results, on the other hand, are not as straightforward. The basic upshot is that Northern Europeans are mostly of indigenous European hunter-gatherer origin, while Southern Europeans are largely derived from Neolithic farmers of mixed European and Near Eastern origin. But the authors identify a minimum of three ancestral populations from their stats (WHG, EEF and ANE), and four meta-populations from the available ancient data (WHG, EEF, ANE and SHG). Here are brief summaries of each of these groups:

West European Hunter-Gatherer (WHG): this ancestral component is based on an 8,000 year-old forager from the Loschbour rock shelter in Luxembourg (one of the individuals mentioned above belonging to I2a1b). The WHG meta-population includes the Loschbour sample and two Mesolithic individuals from the La Brana Cave in Spain. However, today the WHG component peaks among Estonians and Lithuanians, in the East Baltic region, at almost 50%.

Early European Farmer (EEF): apparently this is a hybrid component, the result of mixture between "Basal Eurasians" and a WHG-like population possibly from the Balkans. It's based on the aforementioned LBK farmer from Stuttgart, but today peaks at just over 80% among Sardinians. Apart from the Stuttgart sample, the EEF meta-population includes Oetzi the Iceman and a Neolithic Funnelbeaker farmer from Sweden.

Ancient North Eurasian (ANE): this is the twist in the tale, a component based on a previously reported genome of a 24,000 year-old Upper Paleolithic forager from South Central Siberia, belonging to Y-hg R*, and known as Mal'ta boy or MA-1 (see here). This component was very likely present in Southern Scandinavia since at least the Mesolithic (see the summary of SHG below), but only seems to have reached Western Europe after the Neolithic. At some point it also spread into the Americas. In Europe today it peaks among Estonians at just over 18%, and, intriguingly, reaches a similar level among Scots. However, numbers weren't given for Finns, Russians and Mordovians, who, according to one of the maps, also carry very high ANE, but their results are confounded by more recent Siberian admixture (see the discussion on the European outliers below). The ANE meta-population includes Mal'ta boy as well as a late Upper Paleolithic sample from Central Siberia, dubbed Afontova Gora-2 (AG2).

Scandinavian Hunter-Gatherer (SHG): this is a meta-population made up of Swedish Mesolithic and Neolithic forager samples from Motala and Gotland, respectively. It's a more easterly variant of WHG, with probable ANE admixture.

Below are the two most important figures from the paper: a) the three-way mixture model that is a statistical fit to the data, and b) a plot of the proportions of ancestry from each of the three inferred ancestral populations. As per above, East Baltic populations are the most WHG, which is somewhat curious, because they mostly carry Y-DNA R1a and N1c1.

So if not for the ANE, we'd simply have a two-way mixture model between indigenous European foragers and migrant Near Eastern farmers, at least for most Europeans anyway. Moreover, the seemingly late and sudden arrival of ANE in much of Europe is important, because it's a smoking gun for a major population upheaval across the continent during the Late Neolithic/Early Bronze Age.

Interestingly, archeological data suggest that this was also the period which saw the introduction of new social organization and perhaps Indo-European languages across most of Europe. None of this was lost on the authors of the paper, but it appears they'd rather be cautious pending more ancient genomic data, because they chose not to explicitly mention the Indo-Europeans.

This study raises two questions that are important to address in future research. A first is where the EEF picked up their WHG ancestry. Southeastern Europe is a candidate as it lies along the geographic path from Anatolia into central Europe, and hence it should be a priority to study ancient samples from this region. A second question is when and where ANE ancestors admixed with the ancestors of most present-day Europeans. Based on discontinuity in mtDNA haplogroup frequencies in Central Europe, this may have occurred during the Late Neolithic or early Bronze Age ~5,500-4,000 years ago35. A central aim for future work should be to collect transects of ancient Europeans through time and space to illuminate the history of these transformations.

...

The absence of Y-haplogroup R1b in our two sample locations is striking given that it is, at present, the major west European lineage. Importantly, however, it has not yet been found in ancient European contexts prior to a Bell Beaker burial from Germany (2,800-2,000BC)12, while the related R1a lineage has a first known occurrence in a Corded Ware burial also from Germany (2,600BC)13. This casts doubt on early suggestions associating these haplogroups with Paleolithic Europeans14, and is more consistent with their Neolithic entry into Europe at least in the case of R1b15, 16. More research is needed to document the time and place of their earliest occurrence in Europe. Interestingly, the Mal’ta boy belonged to haplogroup R* and we tentatively suggest that some haplogroup R bearers may be responsible for the wider dissemination of Ancient North Eurasian ancestry into Europe, as their haplogroup Q relatives may have plausibly done into the Americas17.

No doubt, a lot of people will now be wondering about the main source of the ANE that apparently rushed into Europe at the onset of the metal ages. The Siberian steppe will probably be the favored option for many, since this is where Mal'ta boy and Afontova Gora-2 were dug up. However, I'm pretty sure the source was Eastern Europe.

First of all, as already mentioned, it seems that ANE was present in Sweden during the Mesolithic (Figure S12.7 shows around 19% ANE in the Motala12 sample). Secondly, despite the ANE and WHG being classified as separate ancestral and meta-populations, the differences between them appear to be clinal rather than discrete, which I think can be seen in the PCA and ADMIXTURE results from the study (see here and here). Thus, I'd expect a lot more ANE in Eastern Europe during the Mesolithic than in Scandinavia. Thirdly, it's likely that the ancestors of modern Uralic speakers were in Siberia very early, possibly during the Mesolithic, and they were probably East Eurasians aka. Eastern non-Africans (ENA), which ANE is not.

Indeed, latest linguistics research suggests that the pre-proto-Uralics migrated at some point from Siberia into the southern Urals, in far eastern Europe. The Uralics proper then expanded from the southern Urals, probably during the Bronze Age, both to the east and west, as far as the Baltic. This Uralic expansion is certainly reflected in the Lazaridis et al. data, and it's not the only relatively late migration into Europe that shows up in their stats.

While our three-way mixture model fits the data for most European populations, two sets of populations are poor fits. First, Sicilians, Maltese, and Ashkenazi Jews have EEF estimates beyond the 0-100% interval (SI13) and they cannot be jointly fit with other Europeans (SI12). These populations may have more Near Eastern ancestry than can be explained via EEF admixture (SI13), an inference that is also suggested by the fact that they fall in the gap between European and Near Eastern populations in the PCA of Fig. 1B. Second, we observe that Finns, Mordovians, Russians, Chuvash, and Saami from northeastern Europe do not fit our model (SI12; Extended Data Table 3). To better understand this, for each West Eurasian population in turn we plotted f4(X, Bedouin2; Han, Mbuti) against f4(X, Bedouin2; MA1, Mbuti), using statistics that measure the degree of a European population’s allele sharing with Han Chinese or MA1 (Extended Data Fig. 7). Europeans fall along a line of slope >1 in the plot of these two statistics. However, northeastern Europeans fall away from this line in the direction of Han. This is consistent with Siberian gene flow into some northeastern Europeans after the initial ANE admixture, and may be related to the fact that Y-chromosome haplogroup N 30, 31 is shared between Siberian and northeastern Europeans32, 33 but not with western Europeans. There may in fact be multiple layers of Siberian gene flow into northeastern Europe after the initial ANE gene flow, as our analyses reported in SI 12 show that some Mordovians, Russians and Chuvash have Siberian-related admixture that is significantly more recent than that in Finns (SI12).

The authors are actually referring to the Kargopol Russians from the HGDP in that quote. But from my own analyses with a wide variety of samples from Russia, I know that other Russians show similar levels of Siberian admixture to Belorussians, Ukrainians and Estonians.

In any case, this of course means that there are more than three ancestral populations for present-day Europeans, albeit not all of them influenced all Europeans. Also, it's very clear that to learn all the details about the peopling of Europe, these sorts of studies really need to start focusing on the large swath of land that stretches from present-day Poland to the Urals. In other words, Eastern Europe.

I was also going to discuss the genetically inferred pigmentation of the ancient individuals, but, because of the small sample size, there's not much to discuss at this stage. The Loschbour forager possibly had blue eyes (50% chance), but dark hair and skin. On the other hand, the Stuttgart farmer definitely had dark eyes and hair, but relatively light skin. I wonder if this swarthy hunter-gatherer skin complexion has anything to do with the fact that today lots of people from around the Baltic tan really well?

Citation...

Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, bioRxiv, Posted December 23, 2013, doi: 10.1101/001552

See also...

Another look at the Lazaridis et al. ancient genomes preprint

The really old Europe is mostly in Eastern Europe

EEF-WHG-ANE test for Europeans

First genome of an Upper Paleolithic human

ADMIXTURE analysis of Allentoft et al. and Haak et al. ancient genomes

Monday, December 16, 2013

West Eurasian cluster analysis: 13 clusters from 18 dimensions

I ran a quick Mclust analysis to get a better idea of the substructures in my recently updated dataset of West Eurasian samples. Mclust found that the optimal outcome was produced with 18 dimensions of genetic variation and 13 clusters, the latter of which are superimposed on a two dimensional MDS plot below. I chose the labels for the clusters myself and flipped the canvass to fit geography.

Here you can see the 13 clusters superimposed on all possible combinations of the 18 dimensions. Clicking on the image will take you to a 10.3MB PDF file.

It's interesting to note the presence of the very tight Jewish cluster, which includes Ashkenazi, Sephardic and Moroccan Jews. The Basques and Sardinians also cluster together, despite being clearly distinct from each other in the fist two dimensions. This is fascinating because these two groups have been mentioned a few times now in various studies and presentations as being the best modern proxies for Europe's Neolithic farmers.

The widespread Central and Eastern European cluster mostly includes individuals from populations that aren't easily characterized in these sorts of tests, and that's basically because they're of mixed origin. Indeed, I suspect things would look somewhat different in that part of the plot if I had more sizable numbers from Germany, Scandinavia, Poland and nearby areas.

Mclust can produce many more clusters than just 13 from the same data, but as per above, I wanted to see what would happen if it was asked to come up with the optimal solution. For more on this type of analysis check out the articles here, here and here.

Update 17/12/2013: On a related note, here's an Mclust analysis of West, Central and South Asia. The optimal result was obtained with 10 dimensions and 14 clusters. Please note that although some of the clusters have the same names as in the analysis above, they aren't the same clusters.

Thursday, November 21, 2013

First genome of an Upper Paleolithic human

A new paper at Nature reports on the genome of a 24,000 year-old Siberian known as Mal'ta boy or MA-1. Here's the abstract:

The origins of the First Americans remain contentious. Although Native Americans seem to be genetically most closely related to east Asians1, 2, 3, there is no consensus with regard to which specific Old World populations they are closest to 4, 5, 6, 7, 8. Here we sequence the draft genome of an approximately 24,000-year-old individual (MA-1), from Mal’ta in south-central Siberia9, to an average depth of 1×. To our knowledge this is the oldest anatomically modern human genome reported to date. The MA-1 mitochondrial genome belongs to haplogroup U, which has also been found at high frequency among Upper Palaeolithic and Mesolithic European hunter-gatherers10, 11, 12, and the Y chromosome of MA-1 is basal to modern-day western Eurasians and near the root of most Native American lineages5. Similarly, we find autosomal evidence that MA-1 is basal to modern-day western Eurasians and genetically closely related to modern-day Native Americans, with no close affinity to east Asians. This suggests that populations related to contemporary western Eurasians had a more north-easterly distribution 24,000 years ago than commonly thought. Furthermore, we estimate that 14 to 38% of Native American ancestry may originate through gene flow from this ancient population. This is likely to have occurred after the divergence of Native American ancestors from east Asian ancestors, but before the diversification of Native American populations in the New World. Gene flow from the MA-1 lineage into Native American ancestors could explain why several crania from the First Americans have been reported as bearing morphological characteristics that do not resemble those of east Asians2, 13. Sequencing of another south-central Siberian, Afontova Gora-2 dating to approximately 17,000 years ago14, revealed similar autosomal genetic signatures as MA-1, suggesting that the region was continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal that western Eurasian genetic signatures in modern-day Native Americans derive not only from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the First Americans.

Indeed, MA-1 looks like he could be an early ancestor of present-day West Eurasians, including and especially Europeans. Mitochondrial haplogroup U was almost fixed in Upper Paleolithic and Mesolithic Europe, while R1a and R1b are, after all, the most common and widespread Y-chromosome haplogroups in Europe today.

Below is the bar graph from the K=9 ADMIXTURE analysis, which turned out to be the optimal run. Note that the Mal'ta sample appears mostly South Asian (37%), European (34%), and Amerindian (26%), but also with minor Oceanian ancestry (4%). Interestingly, among the Europeans, it's the groups from Northern and Eastern Europe that carry the highest levels of these components. This is probably a reflection, at least in large part, of their elevated indigenous European hunter-gatherer ancestry.

At K = 9, MA-1 is composed of five genetic components of which the two major ones make up ca. 70% of the total. The most prominent component is shown in green and is otherwise prevalent in South Asia but does also appear in the Caucasus, Near East or even Europe. The other major genetic component (dark blue) in MA-1 is the one dominant in contemporary European populations, especially among northern and northeastern Europeans. The co-presence of the European-blue and South Asian green in MA-1 can be interpreted as admixture of the two in MA-1 or, alternatively, MA-1 could represent a proto-western Eurasian prior to the split of Europeans and South Asians. This analysis cannot differentiate between these two scenarios. Most of the remaining nearly one third of the MA-1 genome is comprised of the two genetic components that make up the Native American gene pool (orange and light pink). Importantly, MA-1 completely lacks the genetic components prevalent in extant East Asians and Siberians (shown in dark and light yellow, respectively). Based on this result, it is likely that the current Siberian genetic landscape, dominated by the genetic components depicted in light and dark yellow (Figure SI 6), was formed by secondary wave(s) of immigrants from East Asia.

Here's a figure showing the levels of shared genetic drift between MA-1 and 147 present-day non-African populations. Among the Europeans it's the Lithuanians, Northwestern Russians and Baltic and Volga Finns who are most similar to the ancient sample. It's also interesting to note the relatively high position on the list of the Kalash from South Central Asia and Lezgins from the North Caucasus. At the bottom are Bedouins and Palestinians, mainly because of their non-trivial Sub-Saharan admixture, followed by Oceanians, East Asians, and South Indians, probably due to deep differentiation between their main ancestral clades and that of MA-1.

I've heard that the same team of scientists is now trying to sequence genomes from Upper Paleolithic sites west of Mal'ta. I wonder how far west? I see that the authors mention the Sungir site from near Moscow a couple of times in the paper, in relation to its similarity to the Mal'ta site. Perhaps they're working on a Sungir genome right now? If so, what's the bet that the Y-DNA turns out to be another basal R?

Citation...

Raghavan et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, Nature, (2013), Published online 20 November 2013, doi:10.1038/nature12736

search this blog