search this blog

Friday, December 5, 2014

The Y-chromosome tree bursts into leaf


Update 20/05/2015: Large-scale recent expansion of European patrilineages

...

I wonder what the hardcore Y-DNA genetic genealogists will say about this effort? I know that many of those guys have been working with full Y-chromosome sequences for a while now. It's open access with lots of supplementary info.

Abstract: Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51x, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analysing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of non-synonymous variants in 15 MSY single-copy genes.

Here are a couple of interesting quotes. You can see the samples they're talking about on the tree below. As per the second paragraph, it seems there's a paper about to be published at Nature Communications on European Y-chromosome haplogroups based on some heavy resequencing data (see Batini et al. in the references list). Can't wait for that.

(viii) Rare deep-rooting hg Q lineages in NW Europe: Hg Q has been most widely investigated in terms of the peopling of the Americas from NE Asia (Karafet et al. 1999). Here, as well as an example of the common native American Q-M3 lineage, we included examples of rare European hg Q chromosomes. One of the English chromosomes belongs to the deepest-rooting lineage within Q (Q-M378) and may reflect the Jewish diaspora (Hammer et al. 2009); the other is distantly related, shares a deep node with the Mexican Q-M3 chromosome, and has an STR-haplotype closely related to those of scarce Scandinavian hg Q chromosomes (unpublished data).

(ix) Structure within the west Eurasian hg R: The TMRCA of hg R is 19 KYA, and within it both hgs R1a and R1b comprise young, star-like expansions discussed extensively elsewhere (Batini et al. submitted). The addition of Central Asian chromosomes here contributes a sequence to the deepest subclade of R1b-M269, while another, in a Bhutanese individual, forms an outgroup almost as old as the R1a/R1b split.


Citation...

Hallast et al., The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades, Molecular Biology & Evolution, published online December 2, 2014, doi: 10.1093/molbev/msu327

76 comments:

Davidski said...

I'm still reading this paper. I'll probably update this post after I'm done.

Davidski said...

Haha...the most basal R1b sample is from Bhutan. See supp figure 1.

Unknown said...

Interesting paper, it's ironic, because I've been thinking of haplogroups lately and how I'm starting to think that the dates used to estimate their ages based on modern samples may not be an accurate method in ascertaining the origin of a haplogroup due to a number of factors, such as genetic bottlenecks, founder effects, etc. If Lazaridis et al. taught us anything, it's not to place too much confidence in modern DNA.

Chad said...

The most basal R1b is in Kazakhstan. That Bhutanese was an outgroup, not ancestral to what we know as R1b.

Chad said...

"The addition of Central Asian chromosomes here contributes a
sequence to the deepest subclade of R1b-M269, while another, in a Bhutanese individual,
forms an outgroup almost as old as the R1a/R1b split."

Chad said...

Either way, West Asia keeps getting disproven, over and over, lately.

Davidski said...

It looks like there's a huge paper on the way about Y-hg expansions in Europe, and it'll focus on R1a and R1b. See my update above. Plus this is from the references list.

Batini, C, P Hallast, D Zadik, et al. submitted. Large-scale recent expansion of European patrilineages shown by population resequencing. Nature Comms.

VOX said...

DR-M168 by this study is 48.7 kya, however the Ust-Ishim paper that calibrated the Y mutation rate finds that M168 to be closer to 72kya. Therefore a correction factor of 72/48.7 = 1.478 should be applied to the Y-chromosome TMCRA in Table 2.

Davidski said...

Yeah, they list the TMRCA of hg R as 19 KYA. That doesn't look right.

Ebizur said...

Chad wrote,

"The most basal R1b is in Kazakhstan. That Bhutanese was an outgroup, not ancestral to what we know as R1b."

Where have you seen this? I cannot find even one sample from Kazakhstan in the present study (Hallast et al. 2014 preprint).

Figure S1 clearly shows bhu-0984 as belonging to (the less successful side of) the most basal branch on the R1b side of the R1a-R1b bifurcation. The split between this Bhutanese R1b and the rest of R1b has occurred not long after the split between R1a and R1b.

The rest of R1b in this study is shown as exhibiting a four-way polytomy at the present level of resolution:

1) A branch that quickly has split into the lineage of gre-10 (a Greek?), the lineage of gre-80 (another Greek?), and the lineage of bav-31 (a Bavarian?).

2) A branch that almost immediately bifurcates into the lineage of TSI-NA20785 (a Tuscan who belongs to a subclade of R1b1a2a1b-CTS5981/S1157/Z2115) and the lineage of hun-33 (a Hungarian?) and spa-26u (a Spaniard?). The lineages of the Hungarian and the Spaniard part ways shortly thereafter.

3) A branch that rapidly produces the lineage of tur-11 (a Turkey Turk?), the lineage of bhu-1953 (a Bhutanese), the lineage of hun-34 (a Hungarian?), and another lineage that branches as follows: (Spaniard? + [English? + Serbian? + {Hungarian? + Bavarian?}]). The terminal hun-3 and bav-56 appear to be quite closely related -- perhaps their MRCA might be sort of "proto-Austrian" for lack of a better word.

4) The horrifically fecund other branch of the family.

Chad said...

Ebizur,
I'm talking about the m-343 that is found in Kazakhstan and Western China, sorry.

I would need to see the m-269 that they are talking about for Central Asia, if that is what they are referring to. The only m-269* around today, that I am aware of, is the dead end. That is for the European branch, and not the outlier.

Chad said...

m-269 is only supposed to be dated to 4000BCE, so I wonder if that Bhutan sample was a non-successful m-343. m-269 at close to 19kya is very odd.

Ebizur said...

By the way, the most basal branch of Y-DNA haplogroup T in this study is also represented by a Bhutanese individual (bhu-1892). The genealogical split between bhu-1892 and the rest of haplogroup T seems to have occurred not too long after the split between I1 and I2, the split between O2a and O2b, or the split between R1 and R2 (all of which should have occurred roughly contemporaneously judging from the present study's Figure S1).

Ebizur said...

Chad wrote,

"m-269 is only supposed to be dated to 4000BCE, so I wonder if that Bhutan sample was a non-successful m-343. m-269 at close to 19kya is very odd."

I do not see any contradiction. I presume that bhu-0984 is an example of R1b-M343(xR1b1a2-M269). Because the lineage of bhu-0984 has diverged from the lineage of R1b1a2-M269 so soon after the divergence of R1b-M343 from R1a-L146, I guess that bhu-0984 most likely should belong to R1b1c-V88, R1b1b-M335, or R1b-M343(xR1b1a-P297, R1b1b-M335, R1b1c-V88). Both R1b-M343(xR1b1a1-M73, R1b1a2-M269, R1b1b-M335) and R1b1b-M335 previously have been found in the PRC, which neighbors Bhutan, so the finding of an R1b1c-V88, R1b1b-M335, or R1b-M343(xR1b1a-P297, R1b1b-M335, R1b1c-V88) lineage in Bhutan would not be too revolutionary.

Chad said...

Yes, I figured it was m343. That is supposedly from 16000 BCE. So, they have about the same split. I am not sure of the age on p25. I will have to check that. I think it was 12000BCE.

Unknown said...

Maybe they've discovered a new branch.

Ebizur said...

Chad,

7/66 (10.6%) of the Newars tested by Gayden et al. (2007) have been found to be R-M269. They should be more closely related to the lineage of bhu-1953 (and the Turk, Hungarians, and other Europeans in the third branch of R-M269 noted above) than to the basal R1b (presumably R1b-M343(xR1b1a2-M269)) lineage of bhu-0984. Someone should take a closer look at those Newar R-M269 individuals.

capra internetensis said...

The mutation rate used in the main table is too fast to agree with aDNA evidence - the dates need to be increased by about 50% (at least for older branches). Alternative age estimates are also given in the supplementary info. However, the Y-DNA mutation rate is clearly not consistent, so big fat error bars need to be applied (though nowhere near as bad as for mtDNA).

It is nice to finally see both D and E tested in the same study. So DE coalesces at that same time as CF, shortly after CDEF. D breaks up between F and K (~50 kya using the slower rate). The split between Bhutanese D1a and Japanese D1b happens immediately after the split of D1 from the Bhutanese D*; Japanese D1b1a splits from D1b1 at around 17 kya (using the slow rate and eyeballing the tree). Nothing surprising here but it is good to know.

C divides into C1 and C2. A Nepalese C* groups with Japanese C1a1 (so this is likely another C1a in between Europe and Japan - perhaps some of the Indian C* is also of this type). The trifurcation of C1 into C1a, C1b (here Nepalese C-M356), and Australian C1c is still unresolved, and happens immediately after C2 splits off. The split of Bhutanese C2b2 from Chinese/Bhutanese C2* occurs much later, around the same time as the break-up of J2 (~32 kya using the slow rate). No idea whether this is representative of C2 as a whole. The coalescence of the two Australian C4s is very recent - hard to say due to the difference in branch lengths, but maybe 5kya - suggesting a recent expansion (Pama-Nyungan?), provided the samples don't come from the same area.

Ebizur said...

capra internetensis wrote,

"C divides into C1 and C2. A Nepalese C* groups with Japanese C1a1 (so this is likely another C1a in between Europe and Japan - perhaps some of the Indian C* is also of this type). The trifurcation of C1 into C1a, C1b (here Nepalese C-M356), and Australian C1c is still unresolved, and happens immediately after C2 splits off."

May I ask on what basis you have determined nep-0273 to belong to C-M356?

Please note the following data from Gayden et al. (2007):

Indo-European speakers from Kathmandu, Nepal
2/77 = 2.6% C2-M217(xC2a-M93, C2b1b1-M77, C2e1a1a1-M407)
3/77 = 3.9% C-M216(xC1a1-M8, C1b1-M356, C1b2-M38, C1c1-M210, C2-M217)
1/77 = 1.3% C1b1-M356

Is there some reason why nep-0273 must be C-M356 and nep-0172 must be some sort of C1a? That would surely be plausible, but as far as I can tell, it should be equally plausible that nep-0172 is C-M356 and nep-0273 is a rare sort of C1*.

Matt said...

Not knowledgeable out Y-dna at all, but will try to engage.

I found it interesting that they found patterns of star like expansion in I1 (at 3.5 ky, or 1500BC). That doesn't seem like a novel result though.

Their maximum parsimony origins for each major clade clearly don't mean much?

What's up with the different apparent drift lengths on each branch? My real question is, where a lineage stops, is that a signal that no more mutations occur in that lineage after that point, or is it that their resolution stops, or that the further subdivision are too numerous and unstructured for them to display? The last of which would indicate a big population boom plus further low reproductive variance.

Davidski said...

Where are the maximum parsimony origins estimates?

You mean the age estimates? They look too recent IMO.

And the samples aren't comprehensive and varied enough to say much about the geographic origins of the clades. Of course, even if they were much better we'd still need aDNA to be sure of anything.

Matt said...

Sorry, I mean where in the maximum parsimony tree each branch is coloured according to a region of origin (from on the key with the table).

John Thomas said...

I know this is not really relevant to a forum devoted to European genetics, but returning to that old question of whether y DNA haplotype E is Eurasian or African in origin is still mysterious.
Do any of the com enters here have any thoughts on the matter?

Helgenes50 said...

http://news.harvard.edu/gazette/story/2014/12/the-surprising-origins-of-europeans/

Davidski said...

Yeah, I read that. If Patterson really said that late proto-Indo-European has been tracked to the Caucasus 3,500 years ago, then it looks like he fell off a tree and hit his head on some of the branches along the way.

Helgenes50 said...

I don't know what there is truth in this article, but one thing is sure, Maykop is often mentioned as one of the IE home

Davidski said...

Maikop is often mentioned as being in contact with the Proto-Indo-Europeans of the steppe, and a major influence on them, but not Proto-Indo-European as such.

Matt said...

It sounds like Patterson is still wedded to the idea that the EEF were relatively unadmixed Middle Eastern Farmers, else the comment that "Genetic evidence ruled out one likely related group in the region, the Yamnaya, because their DNA showed the group had hunter-gatherer ancestry, which is inconsistent with the fact that two Indo-European groups, Armenians and Indians, don’t share it, Patterson said." wouldn't make a lot of sense.

The high, middle and low estimates of ENF ancestry in Stuttgart were respectively 91%, 72% and 66%. If an ENF plus Yamnaya model really does not work even under the low estimates, then it just doesn't work I guess.

We'll only know if he has a good reason for this when we've seen the samples and paper.

Doesn't seem impossible that R1a / R entered Maykop, and became dominant then, then Maykop moved through the Caucasus to the steppe in a "skipping stone" model, without interacting that much with the First Farmer derived people there. But seems like a stretch. It will depend on whether Maykop has enough ANE and the right haplos to explain affinities / patterns in India and South Central Asia alone. I wonder if he'd have let that article go out if he didn't already know the answer? Maybe, sounds quite an honest and straightforward person.

Davidski said...

I hope Patterson isn't thinking along the lines that the Proto-Indo-Europeans expanded from Maikop both onto the steppe and into India (ie. because Indians apparently lack the hunter-gatherer ancestry carried by Yamnaya). That really wouldn't make any sense unless the Indo-Iranians originated directly from Maikop, and not on the steppe. Then there's the direct R1a link between Europe and India, which is less than 6,000 years old.

Maybe they're trying to look more objective, so that they're not accused of favoring the steppe hypothesis? However, most linguists do favor the steppe hypothesis, and the Proto-Indo-European problem is foremost a linguistic problem.

Matt said...

Davidski That really wouldn't make any sense unless the Indo-Iranians originated directly from Maikop, and not on the steppe. Then there's the direct R1a link between Europe and India, which is less than 6,000 years old.

Maybe they're trying to look more objective, so that they're not accused of favoring the steppe hypothesis? However, most linguists do favor the steppe hypothesis, and the Proto-Indo-European problem is foremost a linguistic problem.


Could be trying to look objective. On linguistics, under the phylogenies for IE I can see online, the Indo-Iranian branches split off later than Greek and Armenian, and earlier than the main divergence of the European branches of IE (except Greek) Slavic, Baltic, Germanic, Italic and Celtic. It seems clear that the divergence of the European branches, except the Southeast European ones, came late in the family.

So assuming Maikop = Indo European and Yamnaya / Corded Ware = Maikop+Eastern European Hunter Gatherer fusion on the steppe, then I'd guess this the core steppe linguistic evidence incompatible with the Maikop cultural environment should only really strongly exist in the Slavic-Baltic-Germanic-Italic-Celtic grouping?

If that's contradicted, then that's a problem for Maikop = IE. I'm not sure how much of the core steppe vocabulary is dependent on European branches, vs how strongly it exists independently. The linguists may have to test that one out depending on what comes back.

I get the impression linguists generally seem to treat the idea of a primary IE urheimat in West Asia then a secondary one on the steppe as a more complex hypothesis than they would like to test, and from a linguistic point of view, not more compelling than a steppe only urheimat with no secondary urheimat. But if the genetic evidence suggests a secondary urheimat is likely, there might not be much in the linguistic evidence to definitively speak against it.

Re: R1a divergences, as estimated by this paper and Underhill, it's around the same age as the Maikop culture, which is around 5700 BP to 5000BP (3700 BC to 3000 BC). That's about all I know enough to say about.

Unknown said...

I hope they seriously have maykop a/ydna, from 4000bce to 3000bce. That should be interesting.

ZeGrammarNazi said...

It says PIE may be traced back to Maykop, but that ancient DNA samples from Maykop, although obtained, have not been analyzed yet to test the premise.

Davidski said...

Yes, the Mykop DNA is yet to be tested so it's just speculation as far as that's concerned, but the assumption that Patterson is apparently making is that Yamnaya can't be Proto-Indo-European because Eastern European Hunter-Gatherer (EHG) admixture is lacking among Armenians and Indians.

I have to say, if that's true, then that's a very weak and naive approach. I wonder how he worked out that Armenians and Indians don't carry any EHG, because they often do show clear North European-type influence. But even if that's true, then it doesn't preclude a scenario in which there were language shifts to Indo-European throughout Asia without significant autosomal gene flow from the Eurasian steppe in most cases.

Chad said...

I think it's more of a case that Semites and such swamped out the genes in West Asia. The R1b Armenians are still Balkan shifted. Based on the ANE in Central Asia being likely way more than Yamnaya just means that it was native there, pre-farming and IE. All of that local mtdna in those places could be the cause.

Unknown said...

Clearly the age estimates are way to young for this website as they estimate that Q split from R 24,100 years ago however mal'ta boy is R way downstream from the split and is 24,000 years old.

Krefter said...

"Samples have been obtained from Maikop burial sites, but the DNA work to test that proposal is pending, Patterson said."

This means they're testing Maikop samples, right?

Richard Rocca said...

I can see a scenario where early Tripolye-Cucuteni was a mix of WHG with some EEF. Tripolye-Cucuteni shared a frontier with ANE/EEF Steppe groups for several thousand years. It is possible that this mix of Tripolye-Cucuteni WHG/EEF + Steppe ANE/EEF people expanded west into Central Europe, but that the Yamnaya further to the East lacked WHG and that's why it isn't prevalent in Indians nor Armenians. This scenario does not need Maikop.

Richard Rocca said...

Let's not forget that late Tripolye-Cucuteni had the largest proto-cities in the world before their collapse.

Krefter said...

There's over 100 Neolithic and bronze age mtDNA results from Brandt. Jean M did not put on her site!!!!

http://www.ancestraljourneys.org/ancientdna.shtml

Richard Rocca said...

Jut checking, but you know she has a different page for Neolithic samples, right?

Krefter said...

Yes I do know that. I found over 100 samples in Brand. 2013's supp info, she missed out on.

There are over 600(700?) Upper Palaeolithic-bronze age mtDNA samples from west Eurasia, and the majority are from Brandt. 2013(east Germany). As of far I have broken down(predicted, listed extra mutations, listed all negatives and positives) over 50% of them.

It's taken me about 2 weeks to do that though. I predict that I'll be done with all 600> within the next 2 weeks.

When I finish I'll make maps, graphs, comparisons(of samples), etc. and they'll will look very clean and professional and will be very useful. No longer will anyone online be confused about ancient mtDNA.

Ebizur said...

From Table S1 of the present study:

nep-0172 phy Nepalese ASC blood Parkin 07 C C-M216

nep-0273 phy Nepalese ASC blood Parkin 07 C C5-M365

This supports capra internetensis' claim that nep-0172 belongs to the C1a branch.

Can anyone check whether nep-0172 is positive for any SNP that has been found in European C1a2-V20? It would be very interesting if the C*-M216 Indo-European speakers from Kathmandu (Gayden et al. 2007) turn out to be C1a2-V20.

Ebizur said...

nep-0172 appears to belong to C1a2-V20:

228 to nep-0172 135 SNPs A2774169G C6703314G A6777351G A6800465C A6837378T G6845955A(['V20']) G6866303A A6885593G G6912691A T6913217A T6936443C C6970036T T6972117C G7079141C C7135929T A7136339G C7142695G T7202366G G7218549A G7246670A A7284672T A7314655G G7333017A G13871066T C13993511A C14117114T T14152879A C14159782A A14173091G G14186365A G14246351A A14293833G C14329101T A14335672G C14613786T A14774142G T14810634C A14930515G G14952137A T14971682G C15516397T A15585188G G15764570A A15852360G G15935624A T15948413G C15949403T A16017823G A16031440G G16061218T T16198378C T16281599C C16284836T T16336719C G16417930A A16740138T T16758905A C16776874G C16787023T T16803391G A16818150T A16819759G G16869270C A16912050G C16922171T C16956320T T16968690G C17072248T G17151659C A17177899G C17350338T G17366710A T17421307G G17486528A G17527878A G17527904C G17577534A T17745332A G17790910T G17817864A G17853558T C17880509G A17892594C C17905059T T17939651C A17941579G G17957440A A17961975C T18023403A G18026975T A18071404G T18160365A C18181289G C18246830T A18551564G G18575701A G18584981A G18642035A G18683495A C18692792T G18789012C G18821446A T18937968C A18961035T C19054703T A19183145G G19220674C G19240900T T19358615C G19397234A(['PF7453']) A19452372G T19454839A(site_of:['Y6910']) C19490323T A19504462T A19566911G T21086251G G21114604A G21202795A C21326357T A21359031C A21387821C T22175097A T22575586C C22616875T G22724232C G22738930A G22832625A C23118493A A23197102G G23208174A T23209087A G23291476A G23479393T A23482918G C23495487A

Courtesy of Greg Magoon and Cofgene.

Davidski said...

Is there by any chance a more palatable list of the fine scale lineages that these samples belong to?

And what's the story with bhu-0984 from Bhutan? Has that R1b clade been spotted anywhere else?

Ebizur said...

Davidski,

The following are the SNPs that distinguish bhu-0984 from the rest of R1b:

15 to bhu-0984 59 SNPs C6717071T G6738373T C6787070T A6792047T T7179436G G13896279A T13957849A C14072846T A14090285G A14231273G C14338481T G14360470A A14500633C G14657839C G14773867A(['SK2056']) T14984242C T15029516C C15272992T(['SK2058']) C15358078T A15583344C G15831337A G15968889A T16253400G T16283543G G16389874T(['SK2061']) A16438649C C16711425T T16817846C T16836912G G16922282A T16922678G T17031789C A17038027G A17081113G G17388704A C17445733T(['F2482']) T17671702C T17810385C A17887195G G17915176A G17921627A C18103355T C18562249T A18617588T C18710688G G18765479A A18835322C(site_of:['YSC0000967']) T19049346C T19185835A G19441535A G19442073T T21082641G T21089581C C21149651T C21318151G T21388393A G22676383A T22937084G C22940274T(['SK2060'])

Chad said...

I think that he is the first of his kind. I am not aware of the differences with the Han, Uighur, and Kazakh samples. I think there might be a couple m-343 among the Mongols, but I'm not certain. I'll look around. How it got there is the question. IE, Turk, Mongol..

Ebizur said...

C17445733T (['F2482']) also has been observed in Y-chromosomes belonging to haplogroup D1a-M15, so it might not be a very reliable SNP.

I do not recognize any of the other SNPs that have been found to be derived in bhu-0984 and ancestral in all other studied representatives of R1b.

terryt said...

"By the way, the most basal branch of Y-DNA haplogroup T in this study is also represented by a Bhutanese individual (bhu-1892)".

That makes sense if we consider that the most likely original route east between Southwest and Southeast Asia was north of the Ganges rather than a simple spread across the whole Indian subcontinent.

Unknown said...

Hui, not Han. Sorry. The connection with Muslims might make it Turko-Mongol related. Hard to say.

Davidski said...

Is that T a T1a2 by any chance?

I suspect the ancient K-M9 (xN,O,P) from the Tarim Basin was T1a2. The rest of the samples there were R1a. So a steppe source is possible.

http://eurogenes.blogspot.com.au/2013/01/lots-of-ancient-y-dna-from-china.html

Chad said...

That supposed Russian paper with R1b in Afanasevo might not be BS. I wish it was out there and peer reviewed.

Chad said...

The arrival wont be from France, of course. Still, m-343 is East of the Urals, and m-269 is on both sides.

Ebizur said...

Davidski wrote,

"Is that T a T1a2 by any chance?"

No, the Bhutanese T (bhu-1892) is not a T1a2.

I have found a few discrepancies between the current version of the ISOGG tree and the current version of the YFull tree, but it is almost certain that NA20758 (a Tuscan) belongs to a subclade of T1a2-L131. Eng-hgT (an Englishman) and gre-12 (a Greek) also are indicated to belong to T-L131 in the present study's Table S1. Pal-5366 (a Palestinian) is indicated to belong to T-P322.

In contrast, NA20527 (another Tuscan) and his closest relative in the tree, ser-3 (a Serbian), have been labeled T-L208. (In fact, NA20527 belongs to a subclade of T1a1a1a1a1-CTS9882, which is itself a subclade of T1a1a-L208, according to the current version of the YFull tree.)

Bhu-1892 is labeled T-M70. His lineage has diverged upstream (and quite far upstream, frankly) of the MRCA of (T1a1 + T1a2).

capra internetensis said...

@ Matt:

The different lengths of each branch just represent the different number of mutations accumulating in different lineages from the branch point up to the present, which exist because mutations do not happen (or are not preserved) at a constant rate. The estimates for time of origin are a sort of average best guess, so they are rather uncertain but much better than nothing. It does mean that you can't be certain of the ordering of splits across lineages, and the average mutation rate that is accurate for one part of the tree is not necessarily correct for another part.

In any case the pedigree mutation rate they used is certainly much too high for old clades, unless our understanding of fossil DNA is completely wrong. But OTOH it might be more appropriate for the younger expansions. I don't think we know yet.

Matt said...

I doubt the Maikop will tell us much definitively, because, even if Maikop is similar to present day Armenians and thus could mix with EEHG to generate Yamnaya, it's not going to be able to mix with an Onge-like population to generate present day Indians, because their ratio of ANE-WHG similarity ratio seems to precludes a simple mix of Caucasus-like and Onge.

So it won't be possible to definitively prove a Maikop population genetic spread which correlates with the spread of IE.
If in a few years though (maybe even next year), we do get a model which shows that you can pretty much mix Maikop, Indus Valley and Onge to produce present day Indians, I do think it'll be the case that if Patterson is standing going "OK, we've got this Maikop population, and if we mix it with pre-IE candidates in West Asia, India and Northeast Europe and thereby Europe, and it is about the right time for pre-IE", then the linguists are going to have to make their case very persuasively.

And at the moment, linguistics is without even a consensus divergence tree for the Indo-European languages. There seem to be models that show at least *ten* simultaneous divergences (even five simultaneous divergences is a lot), not counting extinct groupings like Ilyrian, Phyrgian, Messapic, Venetic. A centum-satem split is not even part of a consensus tree!

But that's all a long time off and only if particular evidence is found at Maikop.

Davidski said...

The early Indo-European expansion will be easier to reconstruct using ancient uniparental markers than genome-wide markers, especially for very mixed groups like Indians IMO.

If, say, we have a deep clade of R1a present among one of the steppe samples in Europe, which is ancestral to both European and Indian R1a, then no amount of philosophizing about being able to fit Indians as partly Eastern European hunter-gatherers or not will change the fact that Indians have ancestry from the European steppe.

This is indeed what it's looking like, because the major splits within R1a-M417 seem to have happened in Europe; Northwest European R1a-CTS4385 split from Eastern Euro/North Euro/Asian R1a-Z645, and then Eastern Euro/North Euro R1a-Z282 split from Asian R1a-Z93. But this is based on modern DNA, and as a result we don't really know where the ancestors of these people were when these splits happened, so we need the same thing to be shown with ancient DNA from the steppe.

I'm only using R1a as an example, because it's an obvious choice. But even full mtDNA sequences can provide irrefutable evidence of movements from the European steppe to Asia. Let's say we find some Yamnaya U4 sequences in the Hindu Kush. It'd be pretty obvious how they made it there based on linguistic evidence.

Simon_W said...

I don't know if the linguistic evidence is so unambiguous. According to Gamkrelidse there are Semitic and other Near Eastern loan words in PIE and conversely early IE loans in Near Eastern languages. According to Mallory there aren't.

According to Mallory there are contacts between Indo-Aryan and Proto-Uralic. But even if that's true (I've met people suggesting that the examples for Indo-Aryan influence look rather Baltic) it doesn't prove a PIE origin on the steppe, because Proto-Indo-Aryan doesn't equal PIE.

According to some linguists words for elk and beaver existed in PIE, which would point to a rather northern homeland. According to Gamkrelidse and Ivanov words for panther, leopard, snow leopard, elephant and monkey existed in PIE, which would point to a rather southern origin. But I don't have the expertise to judge what's right.

As for linguistic trees, there are various versions of IE trees, depending on the features that were considered. Anatolian always splits off first. According to Gamkrelidse and Ivanov, a Tocharo-Celto-Italic branch splits off from the rest next. The latter then splits into Graeco-Armenian-Aryan on the one hand and Balto-Slavic-Germanic on the other hand. However, according to a tree by Ringe et al which was based on phonological and morphological data, the sequence of splits was: Anatolian, Tocharian, Italo-Celtic, Albanian-Germanic, Greek-Armenian, and finally Balto-Slavic and Indo-Iranian. But Ringe's tree based on lexical data looked somewhat different, the sequence of splits went: Anatolian, Tocharian, Italo-Celtic, Greek-Armenian, Indo-Iranian, Germanic, Balto-Salvic. Thus, the main difference in Ringe's trees is the position of Germanic. According to one tree it split off early, soon after Italo-Celtic, according to the other tree, it split off late, from Pre-Proto-Balto-Slavic. The main difference of Ringe's trees to Gamkrelidse's tree is the position of Indo-Iranian. According to the latter, it forms a group with Armenian and Greek, according to the former, it forms a group with Balto-Slavic. But on one thing they all agree: Italo-Celtic split off rather early. Neither Gamkrelidse nor Ringe assign a late split-off date to Italo-Celtic. I think the main source for a late split-off date are the trees from Bouckaert and Gray, who however are not linguists, they merely applied biological statistical methods to linguistics, and their results were criticized by knowledgable linguists.

Simon_W said...

The evidence from y-DNA clearly favours an association of Indo-Iranian with Balto-Slavic rather than with Greek and Armenian.

Davidski said...

Semitic loanwords in PIE aren't controversial or a problem for the steppe hypothesis, because they seem to have been mediated into PIE via proto-Kartvelian (Maykop?).

Deep links between PIE and proto-Uralic also aren't controversial. There are many papers on the topic.

http://www.google.com.au/url?sa=t&rct=j&q=&esrc=s&source=web&cd=22&cad=rja&uact=8&ved=0CCUQFjABOBQ&url=http%3A%2F%2Fwww.kortlandt.nl%2Fpublications%2Fart203e.pdf&ei=dv-DVOXILYPOmAWx8YCoAg&usg=AFQjCNF_212ktZuJyVdmFFQHhaa7vi6MBw&sig2=B5cyIm4cN23Q8JvyGgt5Dw

This one has an interesting schematic.

http://www.kloekhorst.nl/KloekhorstIndoUralicAspects.pdf

And to be honest, Gamkrelidze and Ivanov strike me as the comedy relief duo of Indo-European studies.

terryt said...

"It does mean that you can't be certain of the ordering of splits across lineages, and the average mutation rate that is accurate for one part of the tree is not necessarily correct for another part".

A point often overlooked, or avoided altogether.

Unknown said...

Does it bother any one else that the age estimates in this new article are way to freakingly young and have been disproven directly by high sequencing ancient y-dna at least four times?
1. Mal'ta
2. Ust-Ishim
3. Kostenki
4. I1 neolithic Europe
I estimate that we need to multiply all age estimates by 1.37. I got this estimate by calibrating to Mal'ta

Chad said...

John,
Malta looks to be a dead end. That TMRCA is just that. It doesn't mean the first person with R. It just means that based on the markers and methods, they date it to 19kya, where all R has a common ancestor.

Chad said...

I think that is correct. Does anyone have anything on Mal'ta having descendants?

Unknown said...

You you very mistaken Chad, I know Mal'ta has no desendents however he was not born 200 years after the QR split. This article says that a person with Q and R have a common ancestor if you go back 24,200. Mal'ta is 24,000 years old and he has haplogroup R and he has 71 mutations that define haplogroup R, compared to 261 mutations for someone with R2. If you multiply 126.5 times 261 you get 33,016 years if you multiply 71 by 126.5 and add 24000 you get 32981.5 years so each of this mutations takes about 126.5 years to acquire. If you take 126.5 and multiple it by 71 you get 9000 years or so. This means that Mal'ta boy was born about 9,000 years after R and Q split, which is why I multiple the dates by 1.37. Ust-Ishim and Kostenki as well as Anzick also seem to prove this timeline. Interestingly it seems that CDEF was a lone survivor after Toba, but the authors of the article can not see that and think that all come from the same population within 50kya which is nonsense.

Davidski said...

The dates are certainly off, as I and others have already mentioned above, but I'm not bothered by that. We'll eventually see the correct dates being published when many more ancient Y-chromosomes are sequenced, so it'll probably take a couple of years.

Chad said...

John,
Okay, I see what you're saying, but mutation are not perfect, in any measure. R1 is usually dated to 18.5kya. Some go higher, which might be questionable.

On another note. Check out the Xiongnu y-str data.
300BCE-200CE

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1180365/table/TB2/

Chad said...

These have been used here and revisited a few times, correct?

Ebizur said...

John Smith wrote,

"This means that Mal'ta boy was born about 9,000 years after R and Q split, which is why I multiple the dates by 1.37. Ust-Ishim and Kostenki as well as Anzick also seem to prove this timeline."

I might mention here that I have derived a correction factor of 1.38 during my recalibration of the TMRCA estimates of Yan et al. (2014) based on the Ust'-Ishim specimen's number of SNPs downstream of the K2 node and time of deposition as published in Fu et al. (2014).

Both the authors of the present study and Yan et al. (2014) have cited Xue et al. 2009 as the source of their mutation rate. I think the fundamental problem with published TMRCA estimates might not be variance in the mutation rate in different lineages or through time so much as imprecision of measurement of the mutation rate.

Ebizur said...

Does anyone have any data regarding the distribution of E1b1a1a1a-M58? In the present study, it has been found only in two Palestinians (pal-5225 and pal-5341). There is another Palestinian (pal-5365) within E1b1a1a1d-U175(xU290), though both E-U175(xU290) and E-U290 have been found in Yoruba (and E-U175(xU290) also in a Ngumba) in this study.

Should the lack of E-M58 among this study's Africans be ascribed to sampling error?

Ebizur said...

The two E-M58 Palestinians seem to share a very recent common ancestor.

By the way, there is also a Bhutanese individual (bhu-1164) in E1b1b1b2a-M123, a stereotypically SW Asian subclade of E1b1b. He shares a common patrilineal ancestor with a Bavarian (bav-13) much more recently than those two share a common ancestor with gre-17, a Greek representative of E-M123. I think I will spend some time looking at this study's E1b1b samples in more detail tomorrow.

Unknown said...

Ebizur,
What does Xue say for R1?

Grey said...

"Should the lack of E-M58 among this study's Africans be ascribed to sampling error?"

I'd imagine E developed in the African border zone somewhere and back migrated so wouldn't be surprised if some clades didn't make it.

Grey said...

"Does it bother any one else that the age estimates in this new article are way to freakingly young and have been disproven directly by high sequencing ancient y-dna at least four times?"

I think they'll get more accurate with more samples.

Unknown said...

Older, but decent.

http://www.annualreviews.org/eprint/P39WXRWSQN7baFj2fb78/full/10.1146/annurev-genom-031714-125740

Unknown said...

Ebizur,
Only Wikipedia, but it could be of use...

E1b1a1a1a
E1b1a1a1a is defined by marker M58. 5% (2/37) of the town Singa-Rimaïbé, Burkina Faso tested positive for E-M58.[11] 15% (10/69) of Hutus in Rwanda tested positive for M58.[10] Three South Africans tested positive for this marker.[8] One Carioca from Rio de Janeiro, Brazil tested positive for the M58 SNP.[42] The place of origin and age is unreported.

MfA said...

bav-13 sample from Bavaria Germany is E-FGC18401+