search this blog

Wednesday, October 14, 2020

A new model for the genomic formation of First American ancestors in Asia (Ning et al. 2020 preprint)


Over at bioRxiv at this LINK. The main topic of the preprint is largely outside the scope of this blog. However, the manuscript includes a detailed discussion about how to get the most out of the qpAdm mixture modeling program. I've used qpAdm regularly over the years, and I plan to use it more often in the future, so I'll be looking very carefully at the qpAdm methodology that Ning et al. are recommending. Here's the preprint abstract:

Upward Sun River 1, an individual from a unique burial of the Denali tradition in Alaska (11500 calBP), is considered a type representative of Ancient Beringians who split from other First Americans 22000-18000 calBP in Beringia. Using a new admixture graph model-comparison approach resistant to overfitting, we show that Ancient Beringians do not form the deepest American lineage, but instead harbor ancestry from a lineage more closely related to northern North Americans than to southern North Americans. Ancient Beringians also harbor substantial admixture from a lineage that did not contribute to other Native Americans: Amur River Basin populations represented by a newly reported site in northeastern China. Relying on these results, we propose a new model for the genomic formation of First American ancestors in Asia.

Ning et al., The genomic formation of First American ancestors in East and Northeast Asia, bioRxiv, posted October 12, 2020, doi: https://doi.org/10.1101/2020.10.12.336628

See also...

Ancient ancestry proportions in present-day Europeans

Major updates to ADMIXTOOLS

Yamnaya-related ancestry proportions in present-day Poles

48 comments:

Anthony Hanken said...

Does anyone know what the terminal SNP for M54A is? The study says its using ISOGG 2016 nomenclature in which case "N1b1" is (L731, L733).

In the supplemtary this seems to be contradicted with its calls being N1b1 F1052: G-> T (2), M2256: G-> A (2), L391: G-> C (2), M2117: G-> A (2), F1753: T-> G (1)
This looks like N-Tat in Yfull.

Sofia Aurora said...

Wonderful article!

There is also anorher BIG one about East Asia coming:

https://www.biorxiv.org/content/10.1101/2020.06.03.131995v1

Pavel Flegontov said...

Hello David! Thanks for sharing my work! I'd like to highlight some aspects of the study that are buried in the supplements, but are nevertheless important. I hope the community will find them useful.

First, when qpWave was introduced back in 2012 (Reich et al. 2012), it was a very simple test for the number of gene flows (in any direction) between a set of outgroups ("right" populations) and another set of populations ("ingroups" or "left" populations). When qpAdm was introduced in 2015, it was clearly stated in the supplements of Haak et al. 2015, Mathieson et al. 2015 and Lazaridis et al. 2016 that the method works under the following assumptions (see Fig. S24 in my paper): 1) there are no gene flows from the target group or source proxies into the outgroups; 2) there are no gene flows from the outgroups into the proxy sources (after their divergence with the true sources); 3) outgroups are not cladal with proxy sources.

While assumption 1 is usually taken care of by choosing outgroups much deeper in time than the source proxies, assumption 2 was ignored in all papers published after 2016. Both assumptions (1 and 2) are nearly impossible to control in the "proximal" setup used, e.g., by Narasimhan et al. 2019, when both ingroups and outgroups are close it time and space. Assumption 2 is actually impossible control, if we single out a certain subset of targets from a wider set of ingroups.

Imagine there was a gene flow from outgroups into B, but for some reason (e.g. a later 14C date) we picked A as a target. A simple qpWave test would reject the cladality of A and B, and that prompts us to test more complex models for A. There is a good chance we would find non-rejected qpAdm models for A with plausible admixture proportions, but those models are wrong! To take care of this serious problem, I suggest to test all plausible sources as targets too (see SI p. 5).

Second, we showed on simulated data that ranking alternative qpGraph models by the worst residual (Z-score) is less reliable than ranking by model likelihood. Comparing qpGraph models is tricky, and it is very hard to find one best model even for a moderate number of groups. We introduced some new methods to put model comparison on a firm statistical footing and to search the graph space automatically (see SI sections 12 and 13).

Third, we showed on simulated data that removal of rare variants from analysis skews many f4-statistics away from 0 and dramatically increases the false positive rate of admixture tests based on qpAdm or qpGraph. The second problem as we showed exacerbates the first one (see SI p. 24).

There is a general problem discussed in SI section 13 that f4-statistics calculated on whole-genome shotgun data and those calculated on the 1240K panel are often not well correlated (Figs. S52, S53). This issue was noticed by Bergstrom et al. 2020 (the HGDP shotgun paper), but was not discussed a lot. Statistics including three African groups or two African groups+an archaic group are severely affected (Fig. S53B), and the same effects are reproduced by rare variant depletion (Fig. S53E). Since f-statistics are affected, all downstream analyses like qpAdm and qpGraph are affected too. I believe that the issues we noticed on simulated and real data are the same.

Fourth, biased gene conversion (Pouyet et al. 2018, Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences, eLife) is potentially another confounding factor that could lead to false positive signals of gene flow. To take care of biased gene conversion, only GC and AT sites should be considered. This issue is discussed in SI section 13.

Pavel Flegontov said...

In my view, there are multiple problems with the standard archaeogenetic toolkit composed of PCA, ADMIXTURE, f-statistics, qpAdm, and qpGraph. The basic methods and especially popular protocols like qpAdm with "rotating" outgroups and "proximal" qpAdm modelling were not tested well on simulated data. The most popular set of sites (1240K) is likely not optimal for inferring demographic history due to its complex ascertainment scheme (many HumanOrigins panels + an Illumina panel), the paucity of rare variants and its susceptibility to gene conversion (in fact, the HumanOrigins panel lacks G<->C and A<->T sites altogether). And we've not explored the effects of recurrent mutations (Amos 2020 R. Soc. Open Sci.) and biased gene conversion (Pouyet et al. 2018 eLife) on f-statistics and the derived methods.

My suggestion at the moment is to check most important results on both 1240K and shotgun data, and to be mindful of qpAdm assumptions.

Michalis Moriopoulos said...

@Dr. Flegontov

Thank you for your illuminating posts, sir. I've really enjoyed reading your work and hope to see more in the near future. What do you think of this paper's revision of Kolyma1 and its relation to Saqqaq/Paleo-Eskimos/Neo-Eskimos/etc.? Did it surprise you? That's as big a reversal as the USR1 finding, in my opinion.

Also, would you mind clearing something up for me? Myself and some others at Anthrogenica are fairly sure that there was a mix-up between two samples in the Paleo-Eskimo preprint. I believe this was corrected in the final paper but it appeared in preprint form again later down the road in another [possibly Reichlab] paper, so I'd like clarification:

It seems like there was a mix-up of the samples I7760 and I7781, right? The latter was labelled Ust Belaya Neolithic despite having ancestry (e.g., steppe) that a sample of that age could not possibly have. The sample I7760 on the other hand was originally labelled medieval despite looking like a Neolithic sample. It seems pretty obvious they were mixed-up somewhere along the way. Just making sure we've got it straight.

I do remember there was a geographic error at one point in the original preprint concerning Ust Belaya (the Baikal HGs). I suppose it doesn't help that there are at least two sites called Ust Belaya in Asia: one along Angara in Baikalia and one in Chukotka-Kamchatka. It must be difficult to keep all of these samples straight sometimes!

Davidski said...

I haven't had a chance to properly look at the Ning et al. supp info yet. When I do, I'll try to apply the authors' recommendations to some ancient European models. That should be interesting.

But that might take a while, like a week or so.

Pavel Flegontov said...

@Davidski It would be very interesting to take a look at your results on ancient Europeans. I'm now doing similar work on published shotgun data.

pnuadha said...

@daviski

You usually dont post DNA studies on Asia. Did you just find this article interesting or does it help to uncover the relation of ANE towards Ancient Europeans.

side note: I did not know that ANE went all the way up to the artic ocean in Northern Siberia, as evidenced by Yana. It seems that ANE dominated ancient Siberia.

Davidski said...

@pnuadha

I shared it because of the discussion about qpAdm in the supp info, but yeah, the ANE thing is interesting, and I'll be getting into that in more detail soon.

old europe said...



@pnuadha

I think IIRC that it's Yana being at least partially ancestral to ANE which is a younger genetic cluster than Ancient North Siberians

Rob said...

It might benefit the article to elaborate on its broader implications so it’s more generalisable for prehistorians

Tigran said...

Is there anything that indicates everything in between Eastern Europe and Yana RHS was ANE and not something ENA related?

Guy said...

Hum... You know, this paper can almost be read as an retraction. Cheers,
Guy

gamerz_J said...

@Sofia Aurora

Do you know when that paper is supposed to be published?

gamerz_J said...

Interesting paper, a minor point that stood out to me is the presence of most likely ANE admixture in NE China. Apparently it's there in some samples but not in others.

Also, would it be correct to assume that in this paper ANE seem less ENA-shifted than the previous one about Salkhit and Denisovan ancestry in East Asians?

Tigran said...

How has the ENA shift changed? Also does anybody believe that paper that argued the French were 20-25% East Eurasian?

A said...

@ Davidski

Back in 2018 you wrote:

“One of the most remarkable discoveries in the recent Narasimhan et al. 2018 preprint has to be the presence of what are essentially Eastern European migrant populations within the Inner Asian Mountain Corridor (IAMC) during the Middle to Late Bronze Age (MLBA). … Narasimhan et al. labeled these groups as belonging to the "forest/steppe MLBA" complex (for instance, see the main figure from the preprint here). This is indeed what they are in terms of their genetic structure, but certainly not geography, because the IAMC is well south of the steppe. … Strikingly, most of these people cluster with Bronze Age Eastern Europeans, and even some Bronze Age Central Europeans. … Two of the MLBA IAMC individuals are from Kashkarchi in the Ferghana Valley, in what is now Uzbekistan, and basically on the doorstep of the Indian subcontinent. (…) the MLBA IAMC groups are rich in Y-haplogroup R1a-M417, and in particular its R1a-Z93 subclade, which is today an especially frequent marker in Indo-European-speaking South Asians. (…) Clearly, many populations in South Asia, particularly those speaking Indo-European languages, derive the bulk of their steppe-related ancestry from the peoples of the MLBA IAMC.”

https://eurogenes.blogspot.com/2018/04/on-doorstep-of-india.html

Is this still correct? Narasimhan et al. 2019 gives the impression that Indo-Europeans were already significantly admixed by the time they reached 'the doorstep of India'.

Davidski said...

@A

What I said is still correct.

I don't know why Narasimhan et al. seemingly included the IAMC, and especially the Fergana Valley, as part of the steppe.

This is South Central Asia, and it's well south of the Eurasian steppe. Take a look at any decent map.

You would have to get in touch with the lead authors of Narasimhan et al. and ask them what their thinking was behind their decision.

Davidski said...

@Tigran

Also does anybody believe that paper that argued the French were 20-25% East Eurasian?

I don't think many people ever did.

These sorts of distal models show that the French are partly East Asian-related, not actually part East Asian.

A said...

Thanks.

Sofia Aurora said...

It will appear propably in the first months of 2021

mary said...

@Tigran
Which paper is this?

Vladimir said...

FTDNA clarified Y haplogroups from the article https://www.nature.com/articles/s41467-018-06024-4#MOESM1
Goran Rundfeldt’s R&D group at Family Tree DNA reanalyzed the Y DNA samples from this paper and has been kind enough to provide a summary of the results. Michael Sager has utilized them to branch the Y DNA tree – in a dozen places.


https://dna-explained.com/2020/10/16/longobards-ancient-dna-from-pannonia-and-italy-what-does-their-dna-tell-us-are-you-related/

gamerz_J said...

@Davidski

"These sorts of distal models show that the French are partly East Asian-related, not actually part East Asian."

Well, wouldn't that translate to substantial East Eurasian admixture over the last 20k years or so? I don't think that there is such an amount in French given that nor ANE nor any other of their ancestral populations seemed to have had this, but I am trying to understand how that number would come up.

Are they comparing them to Kostenki?

ejmohr said...

But do we have any idea about mitochondrial haplotype X which is most common in eastern NA and parts of Europe but not NE Asia. There's the mystery I'd like to see solved.

Tigran said...

@ejmohr

That is interesting. You would expect there to be U in the American gene pool not X.

And Id like to see UP data to settle whether the paternal lineages of ANE came from a SE Asian or Tianyuan like population.

A said...

@Davidski

"You would have to get in touch with the lead authors of Narasimhan et al. and ask them what their thinking was behind their decision."

Hasn't any one else asked them?

Samuel Andrews said...

@All,

Off topic. The name for 'Steppe people' needs to be updated.

Does anyone have a new name to give "Steppe people" aka "Yamnaya." Both names are what have been used up to 2020 but they are inaccurate in key ways.

Steppe sends the false message they were people of the entire Eurasian Steppe which they were not. It also gives the false message they were Steppe nomads in the same sense which historical Sycthians and Turks were.

Yamnaya is inaccurate because they weren't all from Yamnaya. And because the Yamnaya genetic profile predates the Yamnaya cultue.

I'm looking for a name which doesn't have the word 'people' at the end it. Kurgan sounds like a good base for a new name. But Kurgish and Kurgianian don't make sense.

There's no name for them which is as goof for example as the name 'Anatolian Farmers.'

Copper Axe said...

@Samuel Andrews
Western Steppe Herders perhaps? That is the term I tend to use.

Tigran said...

@Samuel Andrews

Aren't they often called Western Steppe Herders (WSH)?

Tigran said...

Also what is everybody's opinion on whether Basal Eurasian is a real thing or not?

Samuel Andrews said...

Western Steppe Herder is accurate but too long. It works for DNA papers.

I should have been specific. I'm specifically looking for a name which is one word and is easy to remember. A name which as simple as English, Spanish, French, etc.

I don't think anyone has a name like that for them. I'm looking for a new name based on a river, city, lake, or Latin version of an English descriptive.

I'm making Youtube series on population History of Europe. Steppe-anything, WHG, EHG, etc do not roll off the tongue well.

Michalis Moriopoulos said...

Western Steppe [Herders] works just fine for me. So does Pontic-Caspian Steppe [Herders].

Eneolithic Western Steppe (Progress-like), EBA Western Steppe (Yamnaya-like), MLBA Western Steppe (Sintashta-like).

Vladimir said...

I find this position in the paper interesting:

The Genomic History of the Middle East

«In addition to the local ancestry from Epipaleolithic/Neolithic people, we find an ancestry
related to ancient Iranians that is ubiquitous today in all Middle Easterners (orange
component in Figure 1C; Table S1). Previous studies showed that this ancestry was not
present in the Levant during the Neolithic period, but appears in the Bronze Age where
~50% of the local ancestry was replaced by a population carrying ancient Iran-related
ancestry (Lazaridis et al., 2016). We explored whether this ancestry penetrated both the
Levant and Arabia at the same time, and found that admixture dates mostly followed a North
to South cline, with the oldest admixture occurring in the Levant region between 3,900 and
5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-
3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age
origin and spread of Semitic languages in the Middle East and East Africa estimated from
lexical data (Kitchen et al., 2009; Figure S8).
This population potentially introduced the Y chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI,
4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer
earlier diverged lineages coalescing ~17 kya (Figure S9)»

Together with the fact that J1 is found both in the North of Russia and in Khvalynsk, and finally the standard of CHG is also J, it is suspected that it was some extinct subclades of J1 that introduced CHG to the steppe


Gabriel said...

@Samuel Andrews

I don’t think it’ll be that easy to find a simple word without the word “people” in it. I would consider “Kurgan people”, but it has the word “people” in it, so I don’t think you’ll like it. Maybe “Kurgan herders” or “Kurgan pastoralists”? I don’t think they’re much better myself, but maybe they’ll work...

@Tigran

Try to browse /pol/ a little bit less. ANE weren’t the super East Asian people you think they were either. Not all ANE people were like Yana.

Helgenes50 said...

@ Samuel Andrews

What is or will be the name of your Youtube Channel ?

Ryan said...

@Tigran - "How has the ENA shift changed? Also does anybody believe that paper that argued the French were 20-25% East Eurasian?"

If "West Eurasian" and "East Eurasian" are the product of incomplete mixing of two distinct, deeper sets of lineages that would make sense. I believe David's deeper treemix runs showed hints of that albeit in the opposite direction? We just lack samples of that those deeper lineages.

I think it's really interesting that West Eurasian mtDNA variation is nested within East Eurasian mtDNA variation, but for Y-DNA it's the reverse. Seems like the West/East Eurasian split was complex.

@David - Thanks for posting this. I know it's sort of out of your wheelhouse but I think understanding the origins of non-European groups helps us understand the deeper origins of West Eurasians.

Norfern-Ostrobothnian said...

Ponto-Caspian Equestrian?

Samuel Andrews said...

@Helgenes50,

Here is a link to my Youtube channel. You can subscribe and hit the bell so you get a notification when I start posting videos in December. (I'm probably going to change the name because Population genetics doesn't mean what I thought it did).

https://www.youtube.com/channel/UCoL-O5egSxkfvkCGGE0vN1Q?view_as=subscriber

My first videos will be a series on the Population history of Europe which is going to be about 20 videos long. Each video will be around 10 minutes.

I think the series will be really good. I'm going to explain it in a way which sounds like a storybook not a scientific paper. Whenever, you put any information into a narrative form you see connections and details you didn't notice when you looked in raw numbers and stats.

It is going to be interesting.

Samuel Andrews said...

By the way, when I change the name of my channel, the channel's address won't change. You'll stay subscribed after the name change.

andrew said...

"https://en.wikipedia.org/wiki/Proto-Semitic_language"

Paleo-Ukrainians?

Bob Floy said...

@sam

Just subbed, looking forward to your videos.

Ebizur said...

Ryan wrote,

"I think it's really interesting that West Eurasian mtDNA variation is nested within East Eurasian mtDNA variation, but for Y-DNA it's the reverse."

Why do you say so? Are you perhaps considering only subclades of O-M175 (or NO-M214) as "East Eurasian Y-DNA"?

Ric Hern said...

@ Tigran

I think we should look at the Early Neanderthal Y-DNA and MtDNA first before making theories about what Basal Eurasians could have been....Later Neanderthal all had Modern Human Y-DNA and MtDNA...If Early Neanderthals turn out the same then serious questions comes to light.

TLT said...

Ric Hern, there is early Neanderthal DNA from Iberia. Those had parental markers that clustered with Denisovans in a different clade from late Neanderthals and modern Humans as far as I recall. On the other hand, their autosomal DNA was grouped with the later Neanderthals in different clade from modern Humans.

Carlos Aramayo said...

@Davidski

What do you think of this new study?

https://tinyurl.com/y3w48swt

"Bronze Age pastoralists in what is now southern Russia apparently covered shorter distances than previously thought. It is believed that the Indo-European languages may have originated from this region, and these findings raise new questions about how technical and agricultural innovations spread to Europe. An international research team, with the participation of the University of Basel, has published a paper on this topic. During the Bronze Age (ca. 3900 - 1000 BCE), herders and their families moved across the slopes of the Caucasus and the steppes to the north, taking their sheep, goats and cattle with them. It is believed that the Indo-Germanic groups, who brought the Indo-European languages and technical innovations such as wagons, domestic horses and metal weapons to Europe, may have originated from this region."

Ryan said...

@Ebizur - "Why do you say so? Are you perhaps considering only subclades of O-M175 (or NO-M214) as "East Eurasian Y-DNA"?"

We're dancing close to breaking one of David's rules here (sorry David), but I'm suggesting everything under Y-DNA K2 as "East Eurasian" and that F(xK2) is "West Eurasian." C and DE who knows.

@David - Do you have any interest in revisiting this post now that we have better samples from the region? I thought it was a really good one. https://eurogenes.blogspot.com/2016/07/layers-of-ancient-north-eurasian.html

Also interesting that this paper links Indo-European with Uralic, Yukaghir and Eskaleut as a clade.

Copper Axe said...

@Carlos Amayo

That Eureka article really misrepresents the finding of the research article in my opinion.

Also they apparently didn't get the memo from Wang's paper that the Indo-Europeans did not migrate out of the Caucasus...

Bonus points for the usage of Indo-Germanic lol