search this blog

Sunday, August 23, 2020

Fascinating stuff


Coming soon I guess:

But we have results from the Ezero culture, from Southeastern Bulgaria, which is from the early Bronze Age and which seems to connect the people of this culture with the future Hittites and Trojans. This has been confirmed by archeology many times and has been known for at least half a century. But now we see the genetic parallels between the two. Some of these ancient groups from the Bronze Age in one way or another have survived to this day in our country Bulgarians, as we also carry a certain amount of blood and genes from these same people, perhaps in the range of between 5 and 10%, which connects us with the Hittites, ancient Anatolia and the Trojans. There is a huge processing of the results before they are published, but among them there are huge curiosities from now on. One of them is from the necropolis in Merichleri from the Early Bronze Age and in another necropolis in Tsaribrod (the older of the two), these are mound necropolises from the Yamna culture in the Caucasus, of people who migrated here in Bulgaria and connected between you are. They came from the haplogroup R1a, namely Z93, which is the haplogroup again of the Scythian, but more of the Indo-Aryan tribes, the future Indo-Aryans, who later conquered India. But one of the tribes of the Yamna culture seems to have strayed and arrived in the Balkans instead of going to India. And so by chance, because archaeologists and geneticists have chosen between 260 burial mounds from this period, they have chosen only 3-4 and have come across exactly this extremely ancient group, which is from the time before the Indo-European group was divided into Iranians, Indians and Slavs, they were still one people at the time with the same genomes. And yes, one of these groups is among what we call Thracian tribes, but these are not Thracians. We have results from both the Early Iron Age and the Late Bronze Age, which are possibly Thracian, but I will keep them a secret at this stage, as I do not want to provoke speculation.

See also...

The precursor of the Trojans

Steppe invaders in the Bronze Age Balkans

Wednesday, August 19, 2020

Yamnaya-related ancestry proportions in present-day Poles


Modeling ancient ancestry proportions in present-day Europeans with the qpAdm software is now a lot more difficult. The reasons for this are updates to qpAdm as well as the availabiity of more useuful outgroups or right pops.

This isn't necessarily a bad thing, because users are forced to work harder to find successful models, which is likely to lead to some interesting discoveries. But it can be very frustrating.

I don't think that settling for poor statistical fits or using a small number of outrgoups are acceptable short cuts. Perhaps sequencing modern-day samples in exactly the same way as the ancient samples, and thus increasing the compatability between them, might help?

Limiting qpAdm runs to higher quality SNPs from transversion sites does help, but perhaps largely because of the significant reduction in markers?

In any case, I've now given up on running such analyses, at least until I see some serious pointers on the topic from Harvard's qpAdm experts. But before I put this project to bed for the time being, I'd like to share some new results for Poles from eastern and western Poland, respectively.

right pops:

CMR_Shum_Laka_8000BP
MAR_Taforalt
IRN_Ganj_Dareh_N
Levant_PPNB
GEO_CHG
TUR_Barcin_N
RUS_Piedmont_En
SRB_Iron_Gates_HG
WHG
RUS_Karelia_HG
MNG_North_N
RUS_Ust_Kyakhta

left pops:

Polish_East
CWC_Baltic_early 0.572±0.024
SWE_TRB 0.428±0.024
chisq 11.776
tail prob 0.300296
Full output

Polish_West
CWC_Baltic_early 0.587±0.021
SWE_TRB 0.413±0.021
chisq 11.165
tail prob 0.34478
Full output


Even using transversion sites, this is one of the very few combinations of ancient reference samples that works for the Poles with these right pops. That is, the combination of early Corded Ware samples from the East Baltic (CWC_Baltic_early) and Funnel Beaker samples from Scandinavia (SWE_TRB). The former are obviously the proxy here for Yamnaya-related ancestry.

Adding any sort of hunter-gatherer population to this model doesn't help or even makes things worse (for instance, see here and here). It is possible to add Baltic hunter-gatherers to a similar model after dropping CWC_Baltic_early in favor of closely related samples from the Early to Middle Bronze Age Pontic-Caspian steppe. Note, however, that the statistical fits are somewhat poorer.

Polish_East
Baltic_LTU_Narva 0.032±0.014
PC_steppe_EMBA 0.483±0.019
SWE_TRB 0.485±0.019
chisq 17.143
tail prob 0.0465198
Full output

Polish_West
Baltic_LTU_Narva 0.031±0.011
PC_steppe_EMBA 0.491±0.015
SWE_TRB 0.477±0.016
chisq 22.444
tail prob 0.00757421
Full output


Interestingly, but not surprisingly, the ancestry of many present-day Northwestern European populations can be modeled in basically the same way. That's because ancient ancestry proportions are more closely correlated with latitude than longitude across much of the European continent.

English_Kent
CWC_Baltic_early 0.527±0.024
SWE_TRB 0.473±0.024
chisq 13.042
tail prob 0.221357
Full output

Icelandic
CWC_Baltic_early 0.586±0.023
SWE_TRB 0.414±0.023
chisq 16.517
tail prob 0.085751
Full output

Scottish
CWC_Baltic_early 0.583±0.021
SWE_TRB 0.417±0.021
chisq 12.144
tail prob 0.275536
Full output


A zip file with the qpAdm output from this analysis and a list of the most relevant ancients is available here. I might try to run a few more populations over the next few days, but probably only from the northern half of Europe, so please check the zip file in a week or so to see what else is in there.

If anyone wants to challenge my results, note that these and very similar samples are freely available to the public via Harvard University here and here.

Update 22/08/2020: From Nick Patterson (Broad) in the comments:
My general advice for qpAdm is 1) Work on the right hand set. Don't include irrelevant population (except for one population as an outgroup); picking the best RHS can dramatically reduce s. errors on the admixture weights. 2) If qpAdm gives a very low p-value try and understand why, sometimes it is telling you that the target is not a mixture of the sources but sometimes the assumptions are violated, for example recent gene-flow from left pops -> right.

See also...

Ancient ancestry proportions in present-day Europeans

Tuesday, August 18, 2020

Housekeeping stuff


I'm about to phase out the use of the Global25 datasheets with modern-day samples. In large part, this move is due to the uncertainty about the populations that these individuals represent and the resulting (often idiotic) discussions here and elsewhere about their usefulness.

This uncertainly exists because many, perhaps most, of these people are classified based on their self identity, which may or may not reflect their genetic origins.

Thus, I'll no longer be updating these datasheets and, from next week, I'll also stop linking to them at this blog (like here). The links will remain live for the next few months, so that users can adjust to the change.

However, modern-day samples sequenced from archeological remains, and thus, as a rule, painstakingly classified by experts based on their burial contexts and genetic characteristics, will continue to be featured in the Global25 datasheets.

In other words, as far as the Global25 is concerned, all of the modern-day samples from the living are out, but all of the modern-day samples from the dead will remain, and indeed I'll be adding more of the latter as they become available.

I'm planning to eventually create several sets of Global25 datasheets based on individuals and populations from different periods, including the modern era. But I'll probably need some help with that.

Also, please note that comment moderation will now be the rule here rather than the exception. And I'll be cracking down hard on trolling, insults and any sort of potentially defamatory material, so no more crazy stuff, or else.

See also...

New rules for comments

Friday, August 14, 2020

Awesome new toys from Vahaduo


Vahaduo now offers a 3D PCA experience. Check it out HERE and HERE. Below are a couple of screen caps of me messing around with the new tools.



Vahaduo says:

Hi everyone!

New tool - PCA 3D Viewer.

Global 25 version:

https://vahaduo.github.io/3d/g25

West Eurasia version:

https://vahaduo.github.io/3d/we


Usage:

Dots - ancients, circles - moderns.

Click X, Y, Z or COLOR tab and then click one of the PCx buttons to switch dimensions.

Click already active X, Y, Z or COLOR tab to temporarily reverse selected dimension. It will be restored to a default state when any of the dimensions will be switched to another one.

ADD CUSTOM POINTS - self-explanatory. Points will be added as "+". IMPORTANT - G25 version takes NON-SCALED coordinates. This will be true for any new tool dedicated to G25 and coordinates will be scaled automatically when needed or desired.

"Type parts of names." + TAG button - type parts of names to tag certain samples (try for example "KK1 Afon Pinar"). Search is Case Sensitive. Points will be redrawn as "x".

Next row - Labels and Annotations. Click the right button to cycle trough:

CLEAR LABELS + LABELS:AUTO
CLEAR ANNOTATIONS + ANNOTATIONS:CLICK
CLEAR ANNOTATIONS + ANNOTATIONS:AUTO
CLEAR ALL + LABELS AND ANNOTATION OFF
CLEAR LABELS + LABELS:CLICK

CLICK - click to add/delete labels/annotations.
AUTO - same as CLICK plus labels/annotations will be automatically added to newly plotted or tagged samples. Unfortunately adding new labels and annotations becomes very slow when there is too much of them, so there is a limit for the AUTO setting - 250 labels or 20 annotations at once.

Annotations are editable. They can be dragged to another place and text can be changed. Text can be also wrapped into an HTML SPAN element and some styles can be used, like "font-size" or "color". BR element (new line) works too.

HIGHLIGHT CLICK/OFF/HOVER - highlight all samples that belong to a single population. Set to HOVER to ignore clicks. Set to OFF to disable this feature and to remove highlight triggered by hover (highlights triggered by clicks will stay until they will be cleared or removed by a click). Click white dot to cycle trough available highlight colors.

Plotly buttons:

Default download is set to 1600x1200px PNG.

Custom Plotly buttons:
"Toggle projection: orthographic / perspective" - self-explanatory.
"Toggle background color" - cycles trough dark grey, black, white and light grey. Text color and white highlight will be switched to black when background will be set to white or light grey.
"Toggle color scheme" - cycles trough several gradients.
"Reverse color scheme" - reverses all gradients permanently.
"Download plot as png (custom size)" - default size is the size of the currently displayed plot.

See also...

New Global25 interpretation tools

Tuesday, August 11, 2020

Villabruna people existed in Europe at least 17,000 years ago (Bortolini et al. 2020 preprint)


Over at bioRxiv at this LINK. So, like I said here a few years back, there was no migration into Europe from the Near East ~14,00 years ago. I don't think there was even such a migration ~17,000 years ago. My view is that the so called Villabruna cluster formed somewhere in Europe at least 20,000 years ago. Below is the Bortolini et al. abstract, emphasis is mine:

The end of the Last Glacial Maximum (LGM) in Europe (~16.5 ka ago) set in motion major changes in human culture and population structure. In Southern Europe, Early Epigravettian material culture was replaced by Late Epigravettian art and technology about 18-17 ka ago at the beginning of southern Alpine deglaciation, although available genetic evidence from individuals who lived ~14 ka ago opened up questions on the impact of migrations on this cultural transition only after that date. Here we generate new genomic data from a human mandible uncovered at the Late Epigravettian site of Riparo Tagliente (Veneto, Italy), that we directly dated to 16,980-16,510 cal BP (2σ). This individual, affected by a low-prevalence dental pathology named focal osseous dysplasia, attests that the very emergence of Late Epigravettian material culture in Italy was already associated with migration and genetic replacement of the Gravettian-related ancestry. In doing so, we push back by at least 3,000 years the date of the diffusion in Southern Europe of a genetic component linked to Balkan/Anatolian refugia, previously believed to have spread during the later Bolling/Allerod warming event (~14 ka ago). Our results suggest that demic diffusion from a genetically diverse population may have substantially contributed to cultural changes in LGM and post-LGM Southern Europe, independently from abrupt shifts to warmer and more favourable conditions.

Bortolini et al., Early Alpine human occupation backdates westward human migration in Late Glacial Europe, bioRxiv, posted August 10, 2020, doi: https://doi.org/10.1101/2020.08.10.241430

See also...

Villabruna cluster =/= Near Eastern migrants

Monday, July 27, 2020

Ancient ancestry proportions in present-day Europeans (to be continued)


This year has already been massive in all sorts of ways, including for new data and software releases. So I'm thinking it might be time to update many of the analyses that were featured at this blog a while ago.

Let's start with the classic hunter vs farmer vs herder mixture model for present-day European populations. The rules of the game are as follows:


- run the latest version of qpAdm using qpfstats output

- use transversion sites and 1240K capture data

- pick a set of diverse and chronologically sound outgroups

- for a model to be successful the p-value must reach 0.01

- tweak the left pops in models that are clearly underperforming

- follow high end scientific literature, logic and common sense


Obviously, the reason that I decided to limit my analysis to markers from transversion sites is to mitigate problems associated with modeling the ancestry of modern, high quality samples with relatively low quality ancients. One of these problems appears to be qpAdm assigning faux East Asian/Siberian admixture to present-day Europeans (for instance, see figure 4 here).

My starting reference populations and outgroups are listed below. In qpAdm terminology the former are known as the "left pops", while the latter as the "right pops". Most of these samples are freely available at the David Reich Lab website here.

left pops:
HUN_Koros_N_HG
TUR_Barcin_N
UKR_Yamnaya

right pops:
CMR_Shum_Laka_8000BP
MAR_Taforalt
Levant_Natufian
IRN_Ganj_Dareh_N
Levant_PPNB
CZE_Vestonice16
BEL_GoyetQ116-1
Iberia_ElMiron
RUS_Karelia_HG
RUS_West_Siberia_HG
MNG_North_N
RUS_Ust_Kyakhta

As you can see, I picked a wide variety of right pops. But I chose most of them specifically to be able to differentiate the three streams of ancestry - from ancient hunters, farmers and herders - that are the focus of my analysis. I also intentionally avoided using samples in the right pops that may have experienced gene flow, including cryptic gene flow, from the populations in the left pops.

I somewhat speculatively earmarked HUN_Koros_N_HG, from the Early Neolithic Carpathian Basin, and UKR_Yamnaya, from the Early Bronze Age North Pontic steppe in what is now Ukraine, to represent the hunter-gatherer and pastoralist streams of ancestry, respectively.

That's because I expected HUN_Koros_N_HG to be the best proxy for the hunter-gatherer ancestry that was initially absorbed by the early farmers who fanned out from the Aegean region across much of the European continent, and of course it made sense to choose a steppe pastoralist population that was located close to Central Europe where such groups first made the biggest impact outside of the steppe.

Interestingly, HUN_Koros_N_HG and UKR_Yamnaya did prove to be among most effective choices for the types of ancestries that they represented. For instance, UKR_Yamnaya generally produced much stronger statistical fits than a very similar set of Yamnaya samples from the Caspian steppe (more precisely, from the Samara region in Russia). However, this might well be an artifact, due to very specific characteristics of these few ancient individuals. Larger sample sets would be welcome, especially from Yamnaya sites in Ukraine.

Below, dear audience, is a spreadsheet featuring the preliminary results. Click on the image to view and/or download the spreadsheet. The general rule is that the higher the tail prob, or p-value, the more likely it is that the ancestry proportions are close to the truth (a tail prob of well below 0.05 is usually a strong indication that something isn't right). For a detailed look at each of the qpAdm runs, feel free to consult the zip file here.


Note, however, that many of the European groups in my burgeoning genotype dataset are yet to make an appearance in the spreadsheet. That's because their models with the standard left pops showed p-values well under 0.01, which essentially meant that they failed, and I'm still trying to make them work.

But round one has certainly revealed some fascinating stuff. For instance, except for Hungarians and Estonians, none of the Uralic-speaking groups can be modeled successfully in the standard three-way model.

However, I managed to significantly improve the statistical fits in their models by adding a Siberian population, RUS_Baikal_BA, to the left pops. This is unlikely to be a coincidence, because the Proto-Uralic homeland was almost certainly located in or very near Siberia. Iain Mathieson please take note.

Saami
HUN_Koros_N_HG 0.134±0.043
RUS_Baikal_BA 0.270±0.015
TUR_Barcin_N 0.081±0.026
UKR_Yamnaya 0.515±0.058
chisq 19.865
tail prob 0.0108571

See also...


Tuesday, July 21, 2020

The oldest R1a to date


My popular map of the oldest instances of Y-haplogroup R1a in the ancient DNA record has a new entry: PES001 from the recent Saag et al. preprint. PES001 comes from a burial site in what is now northwestern Russia and is dated to a whopping 10785–10626 calBCE.


Indeed, I'm not aware of any R1a samples older than PES001 among the treasure trove of thousands of ancient samples waiting to be published. So it's likely that this individual will remain the oldest member of our R1a clan for some years to come.

See also...

Y-haplogroup R1a and mental health

Like three peas in a pod

The mystery of the Sintashta people

Tuesday, July 14, 2020

First taste of Early Medieval DNA from the Ural region (Csaky et al. 2020 preprint)


Over at bioRxiv at this LINK. From the preprint:

The ancient Hungarians originated from the Ural region of Russia, and migrated through the Middle-Volga region and the Eastern European steppe into the Carpathian Basin during the 9th century AD. Their Homeland was probably in the southern Trans-Ural region, where the Kushnarenkovo culture disseminated. In the Cis-Ural region Lomovatovo and Nevolino cultures are archaeologically related to ancient Hungarians. In this study we describe maternal and paternal lineages of 36 individuals from these regions and nine Hungarian Conquest period individuals from today's Hungary, as well as shallow shotgun genome data from the Trans-Uralic Uyelgi cemetery. We point out the genetic continuity between the three chronological horizons of Uyelgi cemetery, which was a burial place of a rather endogamous population. Using phylogenetic and population genetic analyses we demonstrate the genetic connection between Trans-, Cis-Ural and the Carpathian Basin on various levels. The analyses of this new Uralic dataset fill a gap of population genetic research of Eurasia, and reshape the conclusions previously drawn from 10-11th century ancient mitogenomes and Y-chromosomes from Hungary.

...

Majority of Uyelgi males belonged to Y chromosome haplogroup N, and according to combined STR, SNP and Network analyses they belong to the same subclade within N-M46 (also known as N-tat and N1a1-M46 in ISOGG 14.255). N-M46 nowadays is a geographically widely distributed paternal lineage from East of Siberia to Scandinavia 33 . One of its subclades is N-Z1936 (also known as N3a4 and N1a1a1a1a2 in ISOGG 14.255), which is prominent among Uralic speaking populations, probably originated from the Ural region as well and mainly distributed from the West of Ural Mountains to Scandinavia (Finland). Seven samples of Uyelgi site most probably belong to N-Y24365 (also known as N-B545 and N1a1a1a1a2a1c2 in ISOGG 14.255) under N-Z1936, a specific subclade that can be found almost exclusively in todays’ Tatarstan, Bashkortostan and Hungary 17 (ISOGG, Yfull).




Csaky et al., Early Medieval Genetic Data from Ural Region Evaluated in the Light of Archaeological Evidence of Ancient Hungarians, bioRxiv, Posted July 13, 2020, doi: https://doi.org/10.1101/2020.07.13.200154

See also...

Hungarian Conquerors were rich in Y-haplogroup N

On the association between Uralic expansions and Y-haplogroup N

More on the association between Uralic expansions and Y-haplogroup N

Ancient DNA confirms the link between Y-haplogroup N and Uralic expansions

Monday, July 13, 2020

Don't believe everything you read in peer reviewed papers


Case in point, here's a quote from a recent paper at the Journal of Human Genetics (emphasis is mine):

The Mordovian and Csango samples have a moderate to slight orientation toward the Central-Asian and Siberian Turkic groups. This could suggest the more significant East Eurasian or Turkic ancestry of these populations, which should be further investigated. German samples are inhomogeneous, and some of the German samples also show this tendency, which can be the result of the recent 20th century Turkish immigration into Germany [42].

Nope, these German samples don't show anything even remotely resembling recent Turkish ancestry. The authors of the paper, Ádám, V., Bánfai, Z., Maász, A. et al., should've been able to figure this out, even with the standard analyses that they ran. Failing that, the peer reviewers at the Journal of Human Genetics should've noticed that the authors were confused.

Moreover, if the authors and peer reviewers actually bothered to take a closer look at metadata for these samples, which were sourced from the Estonian Biocentre, they'd see that they're not even from Germany. In fact, they represent self-reported ethnic Germans from Russia.

My own quick and dirty analysis of these individuals suggests that many of them harbor East Slavic and/or Volga Finnic ancestries. Indeed, only some of them can pass genetically for run of the mill Germans from Germany. The Principal Component Analysis (PCA) below is self-explanatory. It was plotted with the Vahaduo Custom PCA tools freely available here. The relevant PCA datasheet can be gotten here.


That's not to say, of course, that some Germans don't have recent Turkish ancestry, because an increasing number of Germans nowadays do, nor that people with German heritage in Russia shouldn't identify as Germans, because that's entirely their choice.

This blog post isn't about what it takes to be German, and this is not something that I ever want to discuss for obvious reasons. The point I'm making here is that the authors and peer reviewers of the said paper at the Journal of Human Genetics were sloppy and half-arsed in their approach. And, sadly, this isn't an isolated case in peer reviewed scientific literature dealing with human population genetics.

I feel that the Estonian Biocentre is also partly to blame for this cock up, due to its somewhat peculiar sampling and labelling strategies. For instance, its scientists rely solely on self-reported identity to establish the ethnic origins of their samples, and they apparently never remove genetic outliers from their datasets or even try to identify them.

Unfortunately, I fear that this relaxed approach will eventually lead to basic errors and even unusual conclusions in a number of so called peer reviewed papers.

I first raised this issue with the Estonian Biocentre about five years ago, when I noticed that some of the supposedly Polish individuals in its dataset were genetically more similar to various groups from northern Russia than to Poles from Poland. These individuals also showed significant Siberian ancestry, which was very unusual indeed. Where the hell did the Estonian Biocentre find Poles who resembled people from near the Arctic Circle, you might ask? Apparently in Estonia.

OK, I can imagine that sampling ethnic Poles from Estonia may have been easier for the Estonian Biocentre than sampling Poles from Poland. And Estonian Poles certainly make for interesting and useful data points. However, as you can see in the PCA below, some of these individuals (labeled Polish_Estonia by me) aren't representative of the native Polish population, and yet the Estonian Biocentre not only lumps them with their Poles from Poland, but even labels them with the word "Poland". The relevant PCA datasheet can be gotten here.


However, based on my communications with some of the scientists at the Estonian Biocentre, including head honcho Mait Mestpalu, it seems that nothing will ever change there in regards to this issue. Who knows, perhaps some day we'll see a paper based on Estonian Biocentre data in the Journal of Human Genetics claiming that Poles originated near the Arctic Circle? I wouldn't be shocked if that actually happened.

Citation...

Ádám, V., Bánfai, Z., Maász, A. et al. Investigating the genetic characteristics of the Csangos, a traditionally Hungarian speaking ethnic group residing in Romania. J Hum Genet (2020). https://doi.org/10.1038/s10038-020-0799-6

See also...

Like three peas in a pod