search this blog
Thursday, April 30, 2015
The enigma of the Kalash
Last year Garrett Hellenthal et al. claimed that the Kalash people of the Hindu Kush received a large pulse of admixture from somewhere in the west, possibly Europe, as late as 327–326 BCE. They even suggested that Alexander's soldiers may have been the culprits. But this was naive and wrong.
Now, Qasim Ayub et al. are claiming that the Kalash are an Ancient North Eurasian (ANE) population that has remained genetically isolated for the past 11,800 years. This is also naive and wrong.
One day, perhaps in the not too distant future, someone will study the population history of the Hindu Kush using ancient DNA and methods that actually work. What I think they will find is that the Kalash, just like most of their neighbors, are largely the result of an admixture event during the Bronze Age between Indo-Iranian migrants from the steppe and Central Asian agriculturists. They will confirm that the Kalash are an extreme isolate, but only since the Bronze Age, not the early Neolithic.
These results will correlate very nicely with mainstream linguistics and archeology, latest expansion dates for uniparental markers, and even common sense.
Citation...
Ayub et al., The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection, The American Journal of Human Genetics (2015), http://dx.doi.org/10.1016/j.ajhg.2015.03.012
See also...
The teal people: did they actually exist, and if so, who were they?
Friday, April 3, 2015
The teal people: did they actually exist, and if so, who were they?
The ADMIXTURE analysis in Haak et al. 2015 includes a series of intriguing teal colored components from K=16 to K=20 (see image here). The main reason I'm so intrigued by these components is because they generally make up over 40% of the genetic structure of the potentially Proto-Indo-European-speaking Yamnaya people.
But there's only so much one can learn by starring at a bar graph, so I thought I'd have a go at isolating the same signal with ADMIXTURE to study it in more detail. You can view the results of my experiment in the spreadsheet here.
I wasn't able to completely nail any one of the teal components from Haak et al., because I don't have access to all of the samples used in the paper (I'd have to sign a waiver to get them). Nevertheless, the signal looks basically the same.
Below is a bar graph based on the output featuring selected populations and ancient genomes from Europe and Asia. The Fst genetic distances between the nine components are available here.
Note that the teal component peaks in the Caucasus and the Hindu Kush, and generally shows a strong correlation with regions of relatively high MA1-related or Ancient North Eurasian (ANE) admixture. On the other hand, the orange component peaks among Early European Farmers (EEF), who basically lack ANE.
To learn about the structure of the three main West Eurasian components - blue, orange and teal - I made synthetic individuals from the P output to represent each of the components, and tested them with my K8 model. As expected, the teal component harbors a high level of ANE, while the orange component lacks it altogether. Refer to the spreadsheet here.
It's very likely that the teal and orange components from Haak et al. share these traits. I think this is more than obvious by looking at their frequencies across space and time in Eurasia.
I also analyzed the synthetic individuals with PCA based on their K8 ancestry proportions. The samples representing the orange component fall just south of the Stuttgart genome from Neolithic Germany, and this is basically where I expect Neolithic genomes from the Near East to cluster when they become available.
Interestingly, the samples representing the blue component are dead ringers for Scandinavian hunter-gatherers (SHG). However, I suspect this is something of a coincidence caused by the small number of Western European hunter-gatherer (WHG) and Eastern hunter-gatherer (EHG) genomes in the dataset. The algorithm probably doesn't have enough variation to latch onto to create both WHG and EHG components, and in the end settles for something in between, which just happens to resemble SHG.
But the fact that the orange and blue samples more or less pass for ancient populations leaves open the possibility that the same might be said for the teal samples.
So did the teal people actually exist, and if so, who were they?
My view at the moment is that a population very similar to the teal samples formed in Central Asia or the North Caucasus during the Neolithic as result of admixture between MA1-like and Near Eastern groups. This population, I believe, then expanded into the Pontic-Caspian steppe by the onset of the Eneolithic.
Were they perhaps the Proto-Indo-Europeans? Probably not. I'd say they were Neolithic farmers who eventually played a role in the formation of the Proto-Indo-Europeans. In any case, someone had to bring the Caucasian or Central Asian admixture to the steppe, and I have it on good authority that it was already present among the Khvalynsk population of the Eneolithic, albeit at a lower level than among the Yamnaya of the early Bronze Age.
Citation...
Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317
Update 16/11/2015: 'Fourth strand' of European ancestry originated with (Caucasus) hunter-gatherers isolated by Ice Age
Saturday, March 28, 2015
Population genetics of Copper and Bronze Age inhabitants of the Eastern European steppe
I'm hoping like hell that the samples from this thesis eventually get the same treatment as those from Haak et al. 2015.
Summary: This dissertation presents the first genetic study of prehistoric populations in the Pontic-Caspian steppe from the Upper Thracian Plain to the Volga. Hypervariable region I (HVR I) and 30 short sections of the coding region containing 32 clade- determining polymorphisms on the mitochondrial DNA, as well as 20 putatively naturally selected autosomal SNPs and a sex-determining locus were analysed using a combination of multiplex PCR and 454 sequencing. Data analysis was performed on the HVR I of 65 of the 180 Eneolithic and Bronze Age samples. (Partial) genotypes were generated from 61 individuals. Published ancient DNA data from Central and Eastern Europe and Central Asia, as well as modern DNA sequences were consulted for comparison.
The genetic data support the inference that early Neolithic farmers from Southeast Europe were involved in establishing pastoralism in the steppes by demic diffusion. The consistently low values of the FST-statistic (the range includes zero) between the Yamnaya Culture of the steppe and a succession of Neolithic cultures in Central Europe indicate continuous or recurrent contacts between the two regions. Between the Yamnaya Culture and its successor, the Catacomb Culture, the incidence of haplogroup U4, which is at high frequency in hunter-gatherer populations of Neolithic Scandinavia and Mesolithic Northwest Russia, rises from approximately 5 % to above 30 %. It is possible that immigrants from Eastern Baltic hunter-gatherer refugia were involved in the genesis of the Catacomb Culture.
The low FST values between the prehistoric steppe populations and the modern populations of Central and Eastern Europe indicate genetic continuity. This is supported by the nuclear genotype frequencies. According to current knowledge the modern European gene pool can be explained by three roots: indigenous Mesolithic hunter-gatherers, early farmers from the Near East, and an ancient North Eurasian component with an Upper Palaeolithic origin. Maybe the third ancestry component was introduced into the late Neolithic European genome by the North Pontic population.
Source: Wilde, Sandra, Populationsgenetik kupfer- und bronzezeitlicher Bevölkerungen der osteuropäischen Steppe, 2014, Dissertation
Saturday, March 21, 2015
Mitogenomes reveal post-Neolithic gene flow from the Near East to Tuscany
Europeans probably received their Ancient North Eurasian (ANE) admixture from at least a couple of different sources. Most of it no doubt came from the Eurasian steppe during the late Neolithic/Early Bronze Age, very likely with the early Indo-Europeans. But I'd say that a significant amount of the ANE in southern Europe arrived there from the Near East during and after the Late Bronze with a wide variety of groups, possibly including the Etruscans. Here's a new paper from PLoS One focusing on Tuscan mitogenomes that adds weight to my argument.
Background: Genetic analyses have recently been carried out on present-day Tuscans (Central Italy) in order to investigate their presumable recent Near East ancestry in connection with the longstanding debate on the origins of the Etruscan civilization. We retrieved mitogenomes and genome-wide SNP data from 110 Tuscans analyzed within the context of The 1000 Genome Project. For phylogeographic and evolutionary analysis we made use of a large worldwide database of entire mitogenomes (>26,000) and partial control region sequences (>180,000).
Results: Different analyses reveal the presence of typical Near East haplotypes in Tuscans representing isolated members of various mtDNA phylogenetic branches. As a whole, the Near East component in Tuscan mitogenomes can be estimated at about 8%; a proportion that is comparable to previous estimates but significantly lower than admixture estimates obtained from autosomal SNP data (21%). Phylogeographic and evolutionary inter-population comparisons indicate that the main signal of Near Eastern Tuscan mitogenomes comes from Iran.
Conclusions: Mitogenomes of recent Near East origin in present-day Tuscans do not show local or regional variation. This points to a demographic scenario that is compatible with a recent arrival of Near Easterners to this region in Italy with no founder events or bottlenecks.
Citation...
Gómez-Carballa A, Pardo-Seco J, Amigo J, Martinón-Torres F, Salas A (2015) Mitogenomes from The 1000 Genome Project Reveal New Near Eastern Features in Present-Day Tuscans. PLoS ONE 10(3): e0119242. doi:10.1371/journal.pone.0119242
Sunday, March 15, 2015
Modeling the ancestry of Yamnaya with qpAdm
I've been playing around with the new qpAdm program and the Haak et al. dataset over the past few days and managed to come up with what I think are some very promising results. For instance, the Yamnaya genomes from the Samara Valley and surrounds fit rather well as 0.514 Samara hunter-gatherer + 0.486 Georgian (std. errors 0.032, chisq 3.890).
This is an interesting outcome, mainly because Georgian is a Kartvelian language, and linguistics data suggest that the early Indo-Europeans - presumably the Yamnaya nomads or their ancestors - were in close contact with Proto-Kartvelian speakers. Moreover, even though the Yamnaya males tested to date all belong to Y-chromosome haplogroup R1b, which they probably inherited from their hunter-gatherer ancestors, because the Samara forager also belonged to this haplogroup, some of their mtDNA lineages appear to be derived from the Caucasus and/or nearby areas of the Near East.
However, the main problem with this analysis is that it's attempting to model an ancient population as a mixture of a modern one. Indeed, my estimate is that present-day Georgians harbor around 20% of the so called Ancient North Eurasian (ANE) component, which probably arrived in the Caucasus from the Eurasian steppe (see here). If so, then the qpAdm run might be overestimating the non-steppe admixture in the Yamnaya genomes by at least 10%. Nevertheless, I'm quite happy with this result as I await ancient DNA from the Caucasus and Near East.
By the way, I also pretty much nailed the Corded Ware sample: 0.73 Yamnaya + 0.27 Esperstedt_MN (std. errors 0.060, chisq 2.621). Admittedly, an identical result for the same genomes was reported months ago at the ASHG 2014 conference (see here), but that's OK, because it means I'm on the right track.
qpAdm is easy to run, but the quality of its output heavily reliant on the outgroup or "right set" of populations picked by the user. As far as I can see, the following ten populations (a subset of the "magic set" of 15 from Haak et al.) produce the most robust outcomes when analyses are limited to West Eurasian groups.
BiakaWhy do they work so well? I really have no idea, but through simple trial and error I found that some of the others from the "magic set", in particular the Ami, produced much poorer results.
Bougainville
Chukchi
Eskimo
Han
Ju_hoan_North
Karitiana
Mbuti
Ulchi
Yoruba
I'll probably end up posting a whole catalog of qpAdm output in the comments section below over the next couple of weeks. I'm open to suggestions about the models to test and how to improve my runs.
Citation...
Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317
See also...
Yamnaya's exotic ancestry: The Kartvelian connection
Eight thousand years of natural selection in Europe
Update 11/10/2015: Eight thousand years of natural selection in Europe - take 2
...
A new preprint at bioRxiv reports on the first genome-wide scan for selection using ancient DNA, with a couple of unexpected outcomes:
The SNP (rs4988235) responsible for lactase persistence in Europe gives the strongest signal in our analysis. We estimated the selection coefficient on the derived allele to be 0.015 (95% confidence interval; CI=0.010-0.034) using a method that fits a hidden Markov model to the population allele frequencies as they change over time. Our data strengthens previous reports of the late appearance of lactase persistence in Europe, with the earliest appearance of the allele in a central European Bell Beaker sample (individual I0112) who lived approximately 4,300 years ago. We detect no evidence of lactase persistence in Early Neolithic farming populations like the Linearbandkeramik (LBK), or in the steppe pastoralist Yamnaya, despite their use of domesticated cattle (Figure 2).
...
We find a surprise in seven Scandinavian hunter-gatherers from the Motala site in southern Sweden who lived around 7,700 years before present. While the western hunter-gatherers of central and southern Europe largely have the ancestral allele at the two major European skin pigmentation loci, the closely related Scandinavian hunter-gatherers have both the derived alleles contributing to light skin pigmentation at high frequency (Figure 2B). Thus, the derived allele of SLC24A5 was common in both the Scandinavian hunter-gatherers and Early European farmers, but not in the geographically intermediate western hunter-gatherers. Further, in four out of seven Motala samples, we observe the derived allele of rs3827760 in the EDAR gene, which has effects on tooth morphology and hair thickness. This allele has been the subject of a selective sweep in East Asia, and today it is at high frequency in East Asians and Native Americans.
...
The derived allele in the Motala samples lies on the same haplotype as in modern East Asians (Extended Data Figure 4) implying a shared origin. The statistic f4(Yoruba, Scandinavian hunter-gatherers, Han, Onge Andaman Islanders) is significantly negative (Z=-3.9) implying gene flow between the ancestors of Scandinavian hunter-gatherers and Han so this shared haplotype is likely the result of ancient gene flow between groups ancestral to these two populations.
The high frequency of the East Asian-specific EDAR allele among the Motala foragers is even more surprising for me than their inferred light skin. But it does at least gel with the earlier finding that Scandinavian hunter-gatherers did not contribute significant ancestry to modern Europeans (see here).
Citation...
Mathieson et al., Eight thousand years of natural selection in Europe, bioRxiv preprint first posted online March 14, 2015; doi: https://dx.doi.org/10.1101/016477
Friday, March 13, 2015
Yamnaya-related ancestry proportions in Europe and west Asia
Here's a quick and dirty attempt to flush out a Yamnaya-specific ancestral component with the ADMIXTURE software and a few Yamnaya genomes from the recent Haak et al. paper: K6 spreadsheet.
Obviously, we'll need many more ancient samples from the vast Yamnaya horizon to be able to estimate direct Yamnaya ancestry in modern populations with any great confidence. But I'd say this looks like a very reasonable attempt, with more or less comparable results to those published by Haak et al. (for instance, see Figure 3 from the study here).
Please note that this wasn't a supervised run. In other words, I didn't mark the Yamnaya genomes as reference samples with the aim of creating a cluster from them.
However, I initially excluded all individuals from northeastern Europe, the north Caucasus and South Asia from the analysis. The reason I did this was because samples from these regions have a peculiar habit of creating very robust clusters in ADMIXTURE, which is useful when looking at recent variation and wanting low cross validation errors, but not so great when trying to resurrect genetic components from the depths of prehistory.
Once I had a dataset that was forcing the algorithm to focus its attention on the ancient genomes and producing consistent results, I tested the problem samples in batches of 5-10, thus making sure they didn't skew the analysis.
Interestingly, the Yamnaya-specific component peaks in Udmurts, who live close to where the Yamnaya samples were collected. This can hardly be a coincidence.
In any case, I'm hoping to look at this issue in more detail soon with the help of qpAdm, a new program released recently with the updated ADMIXTOOLS package (see here). Based on f4 statistics, qpAdm is specifically designed for analyzing ancient admixture events.
Citation...
Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317
Wednesday, March 11, 2015
New release of ADMIXTOOLS with two additional programs
ADMIXTOOLS 3.0 is now available at github via the Reich Lab site. The updated package includes minor bugfixes and improvements and two new programs: qpWave and qpAdm for studying migration and admixture. Scroll down the page at the link below.
Reich Lab software
Documentation is minimal, but I'm told that users of the old ADMIXTOOLS should be able to get things running. I haven't had a chance to check it out yet, but I'm looking forward to trying qpWave and qpAdm, hopefully this weekend.
Update 28/06/2020: Major updates to ADMIXTOOLS
Thursday, February 19, 2015
The Near East ain't what it used to be
Up for public comment at bioRxiv this week is this paper on the population history of the Near East, with a special focus on Armenians. Here's the abstract:
The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain underrepresented in genetic studies and have a complex history including a major geographic displacement during World War One. Here, we analyse genome-wide variation in 173 Armenians and compare them to 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3,000 and ~2,000 BCE, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1,200 BCE when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of the Armenian ancestry may originate from an ancestral population best represented by Neolithic Europeans.
Unfortunately, the authors failed to even mention the main cause of what they're seeing; the massive influx of Ancient North Eurasian (ANE) admixture into the Near East. They included ancient genomes Oetzi the Iceman and La Brana-1 in their analysis, but not MA-1 or Mal'ta boy, the main ANE proxy.
MA-1 is a low coverage genome, and not easy to work with, but until better ANE reference genomes are sequenced, it simply can't be ignored in studies on the population history of West Eurasia. Here's why:
Above is my Fateful Triangle PCA. Note the eastern shift of the Islamic Near Eastern groups relative to their non-Islamic neighbors. Here are the relevant ANE ancestry proportions:
Anatolian Turks ~16.54%The differences aren't very dramatic, but they're consistent and, as per the PCA, hard to overlook. Indeed, the contrast would be even more obvious if we were to add to the list other exotic admixtures, such as East Asian, South Asian and/or Sub-Saharan.
Armenians ~15.48%
Iranians ~19.61%
Iranian Jews ~14.01%
Lebanese Muslims ~9.82%
Lebanese Christians ~7.14%
If you're wondering why it is that Muslims generally carry more ANE than their non-Muslim neighbors, it's probably because the Islamic expansion had a homogenizing effect on the Near East, and it didn't have as much of an impact on the religious minorities in the region.
How and when ANE arrived in the Near East is still a mystery which can only be solved with ancient DNA. However, my bet is that most of it came after the Neolithic from the Eurasian steppe, the northeast Caucasus and the Altai, with the Indo-Europeans, Kura-Araxes people and Turks, respectively.
Citation...
Marc Haber et al., Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations, bioRxiv, Posted February 18, 2015. doi: https://dx.doi.org/10.1101/015396
See also...
First look at an ancient genome from Neolithic Anatolia
Tuesday, February 17, 2015
Latest linguistics research backs the Indo-European steppe hypothesis
Most of the action on this blog in recent weeks has revolved around the Indo-European urheimat question. So it's probably not a coincidence that I just got this press release in the mail:
Linguists have long agreed that languages from English to Greek to Hindi, known as 'Indo-European languages', are part of a language family which first emerged from a common ancestor spoken thousands of years ago. Now, a new study gives us more information on when and where it was most likely used. Using data from over 150 languages, linguists at the University of California, Berkeley provide evidence that this ancestor language originated 5,500 - 6,500 years ago, on the Pontic-Caspian steppe stretching from Moldova to Russia and western Kazakhstan.
"Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis", by Will Chang, Chundra Cathcart, David Hall and Andrew Garrett, will appear in the March issue of the academic journal Language. A pre-print version of the article is available on the LSA website [see HERE].
This article provides new support for the "steppe hypothesis" or "Kurgan hypothesis", which proposes that Indo-European languages first spread with cultural developments in animal husbandry around 4500 - 3500 BCE. (An alternate theory proposes that they spread much earlier, around 7500 - 6000 BCE, in Anatolia in modern-day Turkey.)
Chang et al. examined over 200 sets of words from living and historical Indo-European languages; after determining how quickly these words changed over time through statistical modeling, they concluded that the rate of change indicated that the languages which first used these words began to diverge approximately 6,500 years ago, in accordance with the steppe hypothesis.
This is one of the first quantitatively-based academic papers in support of the steppe hypothesis, and the first to use a model with "ancestry constraints" which more directly incorporate previously discovered relationships between languages. Discussion of prior studies in favor of and against the steppe hypothesis can be found in the paper.
I'm reading the paper now, and it'll probably take me a while to get my head around it. Admittedly, linguistics is not my strong point, but I might post some observations in the comments if I feel up to it.
In any case, here's one of the phylogenetic trees from the paper. It'd be interesting to see how it lines up with thousands of complete Y-chromosome sequences from these language groups, particularly from Y-haplogroup R1; I have a feeling we'd see some very nice correlations.
Citation...
Chang et al., Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Manuscript to be published in Language, (Vol. 91, No. 1) March 2015.
See also...
Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint)
Eastern Europe as a bifurcation hotspot for Y-hg R1
Subscribe to:
Posts (Atom)