search this blog

Wednesday, February 24, 2016

Cryptic relatives


Open access at Genome Biology and Evolution:

A novel computational method for detecting identical-by-descent (IBD) chromosomal segments between sequenced genomes is presented. It utilizes the distribution patterns of very rare genetic variants (vrGVs), which have minor allele frequencies less than 0.2%. Contrary to the existing probabilistic approaches our method is rather deterministic, because it considers a group of very rare events which cannot happen together only by chance. This method has been applied for exhaustive computational search of shared IBD segments among 1092 sequenced individuals from 14 populations. It demonstrated that clusters of vrGVs are unique and powerful markers of genetic relatedness, that uncover IBD chromosomal segments between and within populations, irrespective of whether divergence was recent or occurred hundreds-to-thousands of years ago. We found that several IBD segments are shared by practically any possible pair of individuals belonging to the same population. Moreover, shared short IBD segments (median size 183 Kb) were found in 10% of inter-continental human pairs, each comprising of a person from Sub-Saharan Africa and a person from Southern Europe. The shortest shared IBD segments (median size 54 Kb) were found in 0.42% of inter-continental pairs composed of individuals from Chinese/Japanese populations and Africans from Kenya and Nigeria. Knowledge of inheritance of IBD segments is important in clinical case-control and cohort studies, since unknown distant familial relationships could compromise interpretation of collected data. Clusters of vrGVs should be useful markers for familial relationship and common multifactorial disorders.

Fedorova et al., Atlas of Cryptic Genetic Relatedness Among 1000 Human Genomes, Genome Biology and Evolution, published online February 23, 2016, doi: 10.1093/gbe/evw034

23 comments:

ghostnorris said...

Interesting data, but poorly written text. They state the directionality of admixture as fact several times, but as far as I can tell they use no methods to establish the origins of the rare variants. Am I missing something?

mooreisbetter said...

Hmmm.... Anyone care to comment on what this does to the sham science of "ethnicity and admixture calculators"?

Davidski said...

@ghostnorris

It's impossible to tell the direction of gene flow based on this type of output alone, and the authors do probably make some wrong assumptions because of that.

MOCKBA said...

To use it to determine real relatedness at the depths of 5-8 generations, one would need to do something slightly different - to look for even less common / perhaps private variation (which ought to be relatively recent in origin). In a population with few founders like Ashkenazi Jews, arguably only shared private variants may provide evidence of true IBD rooted a few generations back as opposed to sharing of much older founder materiel?

Davidski said...

I'd say that IBD segments flagged with very rare alleles should also be very robust for discovering recent ancestry. That's because the idea is not find private segments in pairwise comparisons, but to make sure that the segments being discovered are indeed IBD.

So to apply this toolset to the discovery of recent ancestry, just use a threshold of, say, 5 or 7 cM, or whatever, and you should get very strong outcomes.

Karl_K said...

I think they even removed anything found in only 2 people, as these would be better suited for use on Ancestry.com.

The time is very ripe for a commercial group to use very rare SNPs to extend genetic genealogy beyond 4 or 5 generations.

With much higher coverage, this would also work for ancient genomes. It is unfortunate that the samples are limited in these cases. Bones can't be reused forever of course. So everyone must be careful about drilling into precious specimens.

Davidski said...

Often when they take samples they take a few for each individual and store them in a freezer for when the technology improves.

Gioiello said...


Incredibly in the “Malta FTDNA Project” there is a person belonging to my mother's family, Vagelli:

39
236339
Ambrogio
Cesira Vagelli Parra 1862 - 1960 Pontedera Tuscany
H
A16129G, T16187C, C16189T, T16209C, T16223C, G16230A, C16261T, T16278C, C16311T, C16519T
G73A, C146T, A247G, 522.1A, 522.2C, 309.1C, 315.1C

In rCRS: 309.1C 315.1C 16209C 16261T. It seems H107, even though of course it would need an FMS.

JX153071(Italy) Raule Haplogroup [H] 10-JAN-2014
T152C A263G 309.1C 309.2C 315.1C A750G A1438G C4581S A4769G A8860G A15326G T16209C C16261T

HM765464 Zaragoza Haplogroup [H*] 01-AUG-2010
T152C T195C A263G 309.1C 315.1C A750G C960- A1438G A4769G A8860G A15326G T16209C C16261T T16519C

Of course we have to add to the Tuscan sample in rCRS the mutations 152C! 195C and 263G, thus it is the ancestor of the sample of Zaragoza, that, if it is from Iberia, could be another proof of my theory of the migration of the agriculturalists 7500 years ago from Italy to Iberia (as to Zilhao). An incredible confirmation from the paper just published of Larisa Fedorova. Atlas of Cryptic Genetic Relatedness Among 1000 Human Genomes, where there is a close link between HG01365 from Colombia, HG01625 from Iberia and Tuscan NA20797. Ian Logan was a pioneer of the vrGVs.

Gioiello said...

Of course that the sample is of Spanish origin is only hypothetical due to the name Zaragoza of the scholar, but from California the sample may be from everywhere: “Submitted (08-JUN-2010) Pediatrics, Division of Genetics & Metabolism, University of California, Irvine, 101 The City Drive, Orange, CA 92868, USA”.

Gioiello said...

That this haplotype is rooted in Tuscany is demonstrated from Mitosearch GMEP9.

Karl_K said...

@Gioiello

"That this haplotype is rooted in Tuscany is demonstrated from Mitosearch GMEP9."

There is absolutely no connection necessary between the very rare SNP containing haplotype and any other haplotype in these genomes.

In fact, this is the entire point of this type of analysis. It is like having millions of additional mtDNA or Y haplogroups, but without the direct male or female lineages.

This is why it is used to show connections between populations like Japan and Sub-Saharan Africa!

All you can say is that with 100% certainty, these groups had some very distant connection over the last very many unknown number of generations.

Gioiello said...



Ciao Gioiello,
Eccomi. Mia mamma (vivente) si chiama Vagelli Eugenia, nata a Pianezza (TO). Suo padre, mio nonno, Ugo Vagelli di Pontedera. Sua Mamma era Nella Chiellini.
saluti

Of course I agree with you. Amongst other things, I have just received the letter above, and it seems that this H107 from Tuscany is the same, but there is another haplotype documented from Italy and another more closely linked from California whose orign of course we don't know. But what I said remains: from these researches using the vrGVs it is clear that the link amongst Colombians, Iberians and Tuscans is very close. The link with East Asia a sub Saharan Africa is very far, in fact they match for a few mutations and they may be very likely independent, what is unlikely when they are 5 or more as the paper says.

Gioiello said...

I have to correct me again: the samples were tested in California but came from the “Center for Inherited Cardiovascular Diseases in Pavia, Italy”, thus all the H107 found so far are Italian. Nothing to change, I think, about the link amongst Colombians, Iberians and Tuscans.

Matt said...

The discovery of vrGVs seems to correlate with the intra-population (and regionally between population) rate sharing of sharing with vrGVs -

"The highest number of vrGVs is seen in the African populations (average number is 67,000 vrGVs per person; SD = 7,500), followed by American (average 24,600 vrGVs; SD=4,500), Asian (24,100 ± 4,100), and European (16,200 ± 2,700) populations" and of course that leads to African and African-American individuals have the highest number of shared RVCs per pair within their populations followed by the Japanese and the Puerto-Ricans. In European groups the highest cluster sharing is observed among the Finns (on average, 6.7 shared RVCs per pair) while the lowest – 1.6 RVCs, is found in the Utah white population (CEU). Among Asian people, the average number of shared RVCs also broadly varies from 10.9 for Japanese (JPT) to 2.4 for Chinese (CHB).

Would have been interesting to have seen the inclusion of the CDX (Dai), KHV (Kinh Vietnamese), GIH (Houston Gujurati), Bengali (BEB), etc. populations in with these.

Gioiello said...

@ Matt

If I were able to read the BAM file... I think that Davidski could do easily it: it would be enough to control a few SNPs on chromosome 1. I bet that the SNP in common wouldn't be more than 1. Unfortunately these SNPs aren't tested from 23andMe and I haven't yet uploaded my Full Genome. If Davidski is interested to it, I could give him access to it.

ZohaninAnnesi said...
This comment has been removed by the author.
MomOfZoha said...
This comment has been removed by the author.
Gioiello said...

@ MomOfZoha

Of course I am interested to analyze your data. I contacted the owner of that H107 whose mother came from my zone, Tuscany, even though he is living in Piedmont now.
Of course there was no link between his ancestress Vagelli and my mother (who was K1a1b1e as I am), coming from a woman named Chiellini and who knows her oldest surnames being a woman.
Unfortunately that person ceased writing me when I said that that haplotype could get medical meanings, being linked to cardiovascular diseases.
Of course I am interested only to the study of the origin of the haplogroups. If your haplotype comes from 23andMe, of course the test is limited, but may give some useful information. Have you run it in the "James Lick calculator"? If so, you could send me the results, otherwise you could send me the raw data. My address is gioiellotgnn06@gmail.com.
No nationalism in my researches. The nationalisms and the anti-Italians were in all those who said for so long that "Ex Oriente lux", that Europe was peopled from Middle East etc.
Now we know that Northern Anatolia was linked to Europe from Palaeolitic and Mesolitic, and that Southern Anatolia, Middle East, Iran were completely different.
I am only a scientist and have no religious prejudice. The word has to be given only to the facts.

MomOfZoha said...

@Gioiello

Thank you for your response. I have emailed you the results of the James Lick calculator. I appreciate help in interpretation.

Gioiello said...

Hello MomOfZoha,
as you can see, the calculator of James Lick (who is a friend of mine and a very kind person, and his work is for free too) is very good, as it takes into account all the possibilities given from the last Phylotree, and your father's mt may be many sunclades of hg. H and even some new one.
Thus, as 23andMe tests only obout 3000 SNPs out od 16569, only an FMS which tests all them could say which hg it belongs to.
Of course that it is an H107, with a back mutation in 16209 (or even an intermediate one between H-152C! and H107), can not be excluded, but only after having tested all the 16569 SNPs.
Y-J2 is probably born around the Caucasus, even though also Italy has some very old haplotypes, and very likely is there from many thousands of years. Unfortunately 23andMe tests only very few SNPs for saying more.

Regards, Gioiello

MomOfZoha said...
This comment has been removed by the author.
Gioiello said...

Hi MomOfZoha,
certainly your father in law isn't J2 neither R1a.
rs2032608 Y 21926112 G
i4000123 Y 2751678 A
--------------------------------------------------
i3000048 Y 15030752 C
--------------------------------------------------
rs17250887 Y 8558969 A
rs17250163 Y 21225770 C
--------------------------------------------------
rs17250121 Y 20837553 T (K)
rs4116821 Y 22072340 C (not P)
rs9341318 Y 22745051 T (not L-M22)
rs16980601 Y 15415115 A (not O)
rs16980641 Y 19279765 A (not NO)

He is positive for K but negative for NO and P. The SNPs tested are a few, and very likely he is a rare haplotype.
You should search on his account the SNPs tested from 23andMe and send them to me and how they may say that he is J2, what isn't true.

Regards, Gioiello

---------------------------
P.S. I apologize for all the misunderstandings:
rs2032672 Y 21893881 C (M70= T1a)
rs34179999 Y 16019072 G (L162: not T1a1)
L131=T1a2 not tested

Anyway it seems that 23andMe acted correctly, as I have always thought.

MomOfZoha said...

Gioello:

I did not send you my father-in-law's data only my father's! The analyses you are doing appear consistent with my father (not in-law).

So the person you are analyzing is my father. 23andme stated him to be T Y-hg, and MorleyDNA stated him to be T1a Y-hg. Do you think he is not T either?

Again, the J2 (from 23andme) or R1a1 (from MorleyDNA) Y-DNA groupings are for my *husband*'s father (who is from Iran and Caucasus), and the J2a1a for my *mother's* father.

To be clear: My father, who is the one coming up mtDNA H107 "most likely" according to Haplogrep and James Lick has Y-DNA either
T or T1a
according to 23andme and MorleyDNA respectively.

You think he is not T?

And: Can we just continue by email instead of here?