search this blog

Wednesday, May 15, 2013

South Asian R1a in the 1000 Genomes Project


After a recent update, the 1000 Genomes project now includes 62 individuals of South Asian origin belonging to Y-DNA haplogroup R1a-M17. Their full Y-chromosome sequences have been analyzed by Semargl and Maximus (aka. YFull project), with some interesting but not unexpected results:

- All individuals belong to R1a-Z93, which appears to totally dominate South Asian R1a-M17.

- A single Punjabi from Lahore, northeastern Pakistan, is ancestral for the Z94 mutation, which is just below Z93. All the other individuals are derived for Z94.

- Six individuals - of Punjabi, Bangladeshi and Gujarati origin - are ancestral for L657 and Z2124, the two main mutations immediately below Z94.

- All individuals of South Indian and Sri Lankan origin are derived for L657 or Z2124.

- Based on this sample, there appears to be no substructure along ethnic or geographic lines within South Asian R1a-M17 derived for L657 and Z2124.

Thus, it seems the SNP diversity of South Asian R1a-M17 is low, and decreases from Pakistan, North India and Bangladesh to South India and Sri Lanka. In comparison, there are only 12 European R1a individuals in the 1000 Genomes sample, and they represent all the major subclades of this haplogroup: R1a-Z283, R1a-Z93 and R1a-L664. Therefore, sampling bias can't be used as an argument for the more diverse result from Europe.

The lack of substructure along ethnic and geographic lines within South Asian R1a-L657 and R1a-Z2124 looks unusual, especially considering the caste system in India, and needs to be verified with more extensive sampling. However, if this outcome holds up, it'll suggest that paternal gene flow across South Asia has not been restricted by the caste system or geography. Then again, it could mean the caste system appeared after R1a-L657 and R1a-Z2124 arrived in South India via massive population movements from the north.

Below are all the results in as much detail as the current R1a SNP tree allows. Key: BEB - Bengali from Bangladesh; GIH - Gujaratai from Houston, Texas; ITU - Indian Telugu from the UK; PJL - Punjabi from Lahore, Pakistan; STU - Sri Lankan Tamil from the UK.

Z93+ Z94-
PJL - 1

Z94+ L657- Z2124- Z96-
BEB - 2 PJL - 3 GIH - 1

L657+,Y2+ etc.
1) Y9 (inc. Y7)
GIH - 7
STU - 4
ITU - 4
PJL - 8
BEB - 2

2) Y4+, Y8+, Y28+ (inc. Y6+)
GIH - 6
ITU - 6
PJL - 2
STU - 6
BEB - 5

Z2125+ (Z2124+ Z2122- Z2123-)
PJL - 1

Z2123+ (Z2124+ Z2122-, Z2125-)
PJL - 2
STU - 3
BEB - 1
ITU - 6
GIH - 2

7 comments:

SB said...

Semargl and Maximus!

Davidski said...

Who's Maximus though? Both of you guys run the YFull project, is that right?

SB said...

I am neither, and yes they are the YFull team. Maximus was Centurion on dna forums.

Nirjhar007 said...

''Thus, it seems the SNP diversity of South Asian R1a-M420 is low, and decreases from Pakistan, North India and Bangladesh to South India and Sri Lanka. In comparison, there are only 12 European R1a individuals in the 1000 Genomes sample, and they represent all the major subclades of this haplogroup: R1a-Z283, R1a-Z93 and R1a-L664. Therefore, sampling bias can't be used as an argument for the more diverse result from Europe.''
Dear Davidski dude, we need the age of these SNPs for scientific analysis for its connection to the IE PEOPLE and if you say the date they provide for example here-
http://www.familytreedna.com/public/r1a/default.aspx
Then it is only hypothetical with a chance of sure biased conclusions.
Of course aDNA is the only way to confirm the relations for example corded ware has 4600yo old aDNA of R1a1 but we don't know the language of the culture!
It is very very vital to have the correct age of the SNPs newly discovered and it is clear that South Asians have a lack of SNPs but the high STR variance is unmatched and if Farmanas aDNA is R1a1a Z93+ and other local ones then as i say the movement of IE languages will be more older than suggested.
The latest update also does not rule out the possibility of South Asian origin of R1a as-
''(http://www.familytreedna.com/PDF/New_Y_Chromosome_Binary_Markers_Improve_Phylogenetic_Resolution_Within_Haplogroup_R1a1.pdf) it is said that "the origin of R1a1-M198 arguably occurred somewhere between South Asia and Eastern Europe. Potential candidates could be the Eurasian Steppes (Ukraine – Southern Russia – Kazakhstan – Caucasus) or the Middle East." I would add: between South Asia and Eastern Europe there is also Southern Central Asia: Iran, Turkmenistan, Afghanistan, Tajikistan, even Baluchistan. It is also significant that Z280 is absent in India, showing no movement from Europe to India, whereas Z93+ is present in Europe, not only in Romas. You can find it also often in Arabic populations: http://www.familytreedna.com/public/r1a/default.aspx?vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&vgroup=r1a&section=yresults''
we should not forget also that the Archaeological data of South Asia
does not speak of any kind of culture changing intrusion from the time of 4500BC to ~600B.C.
Have a good time.

postneo said...

the very examples you highlight show high diversity of z93 in south asia not low. There is no discernable substructure yet from sri-Lanka to bengal to afghanistan.
These are not trivial hops as perhaps you imagine. It spans massive populations and distances on the scale of the entire european continent.

This contrasts with z93 in europe where you have to artificially pick and choose widely separated pockets to build a z93 tree. Clearly different layers of bottle necked z93 populations made it to europe at different times isolated from each other.. e.g. jews and gypsies

It does not show high diversity of z283 because it is absent. z93 is probably older than z283 which is more regional.

Davidski said...

You seemed to have missed the most important points of the article...

- A single Punjabi from Lahore, northeastern Pakistan, is ancestral for the Z94 mutation, which is just below Z93. All the other individuals are derived for Z94.

- In comparison, there are only 12 European R1a individuals in the 1000 Genomes sample, and they represent all the major subclades of this haplogroup: R1a-Z283, R1a-Z93 and R1a-L664. Therefore, sampling bias can't be used as an argument for the more diverse result from Europe.

In other words, the number of mutations under Z94 present in India isn't relevant to the diversity of R1a there. The fact that Z93 (xZ94) is extremely rare and other parallel subclades missing altogether in India, means that R1a has a young age and low diversity there.

Basically what seems to have happened is that R1a from Afghanistan squeezed into Pakistan and India via the Khyber Pass fairly recently (late Bronze Age or later), because most Indian R1a is a young subset of the R1a diversity found in South Central Asia.

Mani Kandan said...

//we should not forget also that the Archaeological data of South Asia
does not speak of any kind of culture changing intrusion from the time of 4500BC to ~600B.C.//


ashvamedha -horse sacrifice ritual is common in central asia dating 4000bc-2000BC, there is no presence of horse before 2000BC in IVC. Rig Veda mentioned ashva(horse) more than 200 times but Out of 1500 Indus valley site only surkotada had horse remains that too conversial. There is no horse Indus seal or structure till now found in IVC. Entire IVC collapsed in 2200BC- 1800BC
IVC is urban and fortified, vedic period entirely pastoral life 1800-600BC where did you find planned city like IVC?
IVC and Vedic people where entire two different people.undechipered Indus script already showed some similarities with Dravidian language like Tamil, but Sanskrit didn't not even one word.