search this blog

Friday, April 1, 2016

Inferring heterozygosity from ancient and low coverage genomes

Interesting new preprint at bioRxiv:

Abstract: While genetic diversity can be quantified accurately from high coverage sequencing, it is often desirable to obtain such estimates from low coverage data, either to save costs or because of low DNA quality as observed for ancient samples. Here we introduce a method to accurately infer heterozygosity probabilistically from very low coverage sequences of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence and takes into account both variable sequencing errors and potential post-mortem damage. It is thus also applicable to non-model organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to infer accurately recalibration parameter in the presence of post-mortem damage. This method does also not require knowledge about the underlying genome sequence, but instead works from haploid data (e.g. from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few Mb of haploid data is sufficient for accurate recalibration even at average coverages as low as 1-3x. At similar coverages, out method also produces very accurate estimates of heterozygosity down to $10^{-4}$ within windows of about 1Mb. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples and found that 3,000-5,000 samples showed diversity patterns comparable to modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very differently between the two samples, supporting earlier conclusions of a highly diverse and structured population in Europe prior to the arrival of farming.

However, I posted this observation under the abstract:

It seems like you're contradicting yourselves with these comments:

"Second, the ancestry of modern Europeans traces only partly back to European hunter gatherers with early Neolithic people from the Aegean (Hofmanová et al. 2015) and Yamnaya steppe herders (Haak et al. 2015) contributing the majority of the modern day genetic make up.


The exceptions were the two European hunter-gatherers that showed patterns very different from both modern samples as well as from one another, further corroborating the view (Jones et al. 2015) that these samples represent different and ancient clades that contributed only marginally to the genetic make-up of modern day Europeans."

Yes, of course Early European farmers (EEF) from the Balkans and the Yamnaya and related groups from the Eastern European steppe make up the majority of modern European genetic structure.

However, Yamnaya are essentially a 50/50 mix of Eastern European hunter-gatherers and Kotias-related hunter-gatherers from the Caucasus. So how can you say hunter-gatherers closely related to Kotias didn't contribute much to the modern European gene pool? They're like ~25% of our genomes.

Also, EEF carry a large chunk of Balkan and Central European hunter-gatherer ancestry very closely related to Bichon (same WHG clade). So again, doesn't that contradict your conclusion?

Kousathanas et al, Inferring heterozygosity from ancient and low coverage genomes, bioRxiv, Posted April 1, 2016, doi:


SeanF said...

Dave, I'm somewhat confused about the basic elements of the Yamna genome. Above you suggest it was 50/50 Eastern HG/Caucasian HG, so presumably without any significant Neolithic farmer contribution, but elsewhere on the blog over the last year or so, I seem to see suggestions of at least some EEF/MEF contribution, and the position of ancient Yamna DNA well to the west of EHG and not so very far from the current mixed European heritage also might suggest that. Do you think there was any farmer component at all to the Yamna genome, eg EEF via the Balkans, or MEF via the Caucasus?

PS: I'm not a proponent of the Anatolian hypothesis on a fishing expedition, BTW. Bronze Age steppe all the way.

Davidski said...

Until the CHG genomes came out we were speculating here that early farmers from the Caucasus contributed to the Yamnaya genotype, because it was hard to fathom that Mesolithic foragers in the Caucasus carried the so called Basal Eurasian ancestry. But as it turns out, CHG were in large part, and perhaps mainly Basal Eurasian.

As far as I can tell, PCA and ADMIXTURE suggest that Yamnaya is mainly a mix of EHG and CHG, close to 50/50, but probably with some extra WHG, hence the western pull.

I haven't yet seen any strong evidence of any Anatolian, Balkan and/or Central European farmer input in the Yamnaya genomes currently available. If anything like that shows up, it might just be picking up shared ancestry between late CHG and the Anatolians. Kotias does show a bit of the Anatolian stuff in some tests.

But I think western Yamnaya will be a very different story, with considerable Balkan farmer input, maybe almost as much as what Corded Ware and Srubnaya have.