search this blog

Thursday, May 18, 2017

PCA projection bias fix


A new version of EIGENSOFT has just been posted at GitHub (see here). It offers two flags to minimize the problem of Principal Component Analysis (PCA) projection bias or shrinkage: shrinkmode: YES and autoshrink: YES. For more details refer to the contents of the tarball here.

Thus, when running the new EIGENSOFT and you're wanting to project a sample or a set of samples onto the variation of another set of samples, include the lsqproject: YES flag to account for missing data, and then either shrinkmode: YES or autoshrink: YES. I haven't tried this myself yet, but according to the README file in the tarball linked to above, shrinkmode: YES gives better results but takes up much more CPU time.

PCA projection bias is a problem that I've been whining about for a while now (for instance, see here). I actually have my own simple techniques to get around it that appear to work very well, so I'm not sure if I'll be using the new flags. But I might after I try them out. I'd certainly urge the authors of upcoming ancient DNA papers to do so.

4 comments:

Nirjhar007 said...

Run the Afghanistan sample Dave....

Davidski said...

Not enough data. Only 500 SNPs overlap with my biggest dataset.

But Vadim V. ran it, and apparently it's similar to Near Easterners and Burusho in the two main PCA dimensions.

That's the best anyone's going to do with this genome.

Nirjhar007 said...

Can you link me?. Yes the coverage is the issue , but still worth a try.

Davidski said...

http://www.anthrogenica.com/showthread.php?10555-Darra-i-Kur-(Afghanistan)-sample&p=236087&viewfull=1#post236087