search this blog
Tuesday, November 5, 2019
Modeling your ancestry has never been easier
An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. It works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.
Another free, easy to use online tool that works with Global25 coordinates is the Principal Component Analysis (PCA) runner HERE. Below is a screen cap of me checking out one of the many PCA that it offers.
See also...
Getting the most out of the Global25
Subscribe to:
Post Comments (Atom)
69 comments:
Also please note that there are now two sets of Global25 datasheets, featuring ancient and modern samples respectively. See HERE.
Vahaduo doesn't allow to choose the number of cycles to perform and always does twice the amount of sources in the input. While this doesn't seem to create much problems for "properly mixed" samples so to say, I noticed that you can get weird results if you have the same population in the sources and in the target.
For instance if you put in the sources Barcin_N, Wezmeh_N, Natufians and CHG and try to model Barcin_N itself with these, you won't get ~100% Barcin_N but always something around 90% Barcin_N plus something from the rest. Something similar happens regardless of the choice of target, if it's included in the sources. I think the problem is simple, in that starting from a random distribution of slots and a random point built from the sources, the number of cycles simply isn't enough for the algorithm to update the best point until it closely matches the same source. The problem disappears increasing the cycles even to just, say, 50, and the speed of the algorithm isn't terribly affected. I assume this could cause problems also if you don't have exactly the same source and target but very closely related ones. I tried this by rewriting the same algorithm in R.
Wow, these are great tools which save a lot of time. Is it possible to add your own samples to those PCAs? Or put entirelly new PCAs on there?
Great tool, very easy to use, very fast.
My results:
https://i.postimg.cc/KjSyMYFz/screenshot-32.png
https://i.postimg.cc/TPFzzPK9/screenshot-33.png
It looks like I have to study KAZ_Golden_Horde_Euro. Where can I get some info about him?
On PCA I am not far from Sintashta, which explains why I like Rigveda so much.
https://i.postimg.cc/h4YX2kb1/screenshot-35.png
@EastPole
It looks like I have to study KAZ_Golden_Horde_Euro. Where can I get some info about him?
That's a European, very Baltic-like-individual from the Medieval Kazakh steppe from one of the recent Damgaard et al. papers.
https://www.nature.com/articles/s41586-018-0094-2
Each sample has a unique individual ID that can often be searched online in Google and various databases, such as the "anno" files here.
https://reich.hms.harvard.edu/downloadable-genotypes-worlds-published-ancient-dna-data
@Davidski
How did that guy end up in the Kazakh steppe?
@Gabriel
How did that guy end up in the Kazakh steppe?
He was probably a captive from somewhere in western Russia raided by the Mongols.
@EastPole , @Gabriel : Here is an article in English about this sample from Kazakhstan: https://www.academia.edu/36877181/E.R._Usmanova_I.I._Dremov_I.P._Panyushkina_and_A.V._Kolbina._Mongol_Warriors_of_the_Jochi_Ulus_at_the_Karasuyr_Cemetery_Ulytau_Central_Kazakhstan
Our guy is from kurgan 5.
@claravallensis
That's a fair point.
But it doesn't have a significant impact on well designed models that don't include very similar (highly correlated) source populations.
The source populations in models should always be relatively highly differentiated, otherwise the algorithm is likely to produce unrealistic results just to get the lowest (best) statistical fits.
So the right way to analyze fine structure ancestry (gene flow from very similar sources) is to try similar populations one by one to see how the statistical fits change. You probably know this already, but I just mentioned it for the benefit of others.
And, of course, for more advanced analyses, it's best to use nMonte in R.
These tools are great and really easy to use.
I was playing around with PCA in PAST3, but it seemed a bit difficult to adjust labeling of graphics etc to see who were exactly who (without having it all blued out by row labels). Grouping also took a lot of time to do.
This is much faster and the mouse hover info on individual samples and the zoom and pan functions make it much easier to study the PCA.
A Mediterranean/South European PCA preset would also be nice, though. :)
Just got the Vahaduo thing working as well. This will be a great future tool for studying lesser described samples in further detail which have been neglected in some of the big studies.
Just by looking at West Eurasia PCA in PCA Runner (very good product BTW), it seems that the origin of Baltic Finnic is connected to Levänluhta_IA, Srubna_o and Sintashta_o type of samples and then there's a local cline connecting modern samples to Baltic_BA-type of samples. IA tarand grave sample OLS_10, (the fresh from a sledge guy), for instance seems to be a mixture of modern East Finn/Karelian (more or less the same thing in the past) and a Kivutkalns_BA related samples in the PCA. But, can Levänluhta_IA in reality get anyhow connected to Srubna_o and Sintashta_o outside a PCA, which may be misleading? In other words, to what extent PCA is biased in this case?
P.S. just by looking at the same PCA it seems that there's a cline connecting the rather clearly visible cloud or continuum of different Uralic groups ultimately to neolithic West Siberia. The mainstream European continuum seems to meet this the Uralic cline in/at Baltic_BA, which of course makes a lot of sense.
@Huck Finn
In regards to your question about PCA bias, I don't really know.
But I can tell you that Volosovo as the proto-Uralic culture looks like crap from where I'm sitting. There's no Y-hg N there at all.
Don't know about Garino-Bor yet.
@ D: Garino Bor should be the place. If it's not, then I don't know. The map in page 7 is BTW pretty interesting:
http://www.kirj.ee/public/Archaeology/2012/issue_1/arch-2012-1-3-25.pdf
So, no N in Volosovo, interesting. Many thanks for the information.
Thanks David, you have made clear now that the North Dutch/NW German Bell Beaker area is indeed the 'genetic hub' with linkages more NW-wards over Sea to England/Scotland, NE-wards to Poland and the Baltic and SE-wards into Germany/Austria and deeper! The linkages are very obvious now!
@Huck Finn
I think there is some sort of agreement that proto-Uralic and PIE must have grown up is adjacent area's. So, Garino Bor is pretty interesting as a PU Urheimat suggestion, as it is upstream from the Samara Bend where Khvalynsk resided.
No, there was no contact between the two languages at the time. This is proven. The Uralic languages did not come from Siberia until the middle of the second millennium BC, and it is proved that they came from Siberia, but not autochthonous in Europe.
It seems Seima-Turbino is also from Eastern Europe
@epoch
Apparently there's a pretty strong consensus forming behind the scenes based on the latest linguistic and ancient DNA data that Proto-Uralic formed east of the Urals, and that only the Proto-Finno-Ugric homeland was in Europe.
The Ugric languages are also to the east of the Urals, so the proto-FinnO-Ugric languages are to the east of the Urals. But the Finno-Volga languages ones are already in Europe.
Seima-Turbino is connected with Altai.
@David
O, that is interesting. I have read in several papers the case for a European homeland and one of the consistently put forth arguments is that PIE must have been in the neighbourhood to explain for the list of old cognates.
Is this line of reasoning now obsolete?
@D and re "Apparently there's a pretty strong consensus forming behind the scenes based on the latest linguistic and ancient DNA data that Proto-Uralic formed east of the Urals, and that only the Proto-Finno-Ugric homeland was in Europe."
I'm not aware of such a linguistic consensus but maybe there's one. In regards to Proto Uralic around 2000 BCE, Pre Proto Uralic is a different issue, it seems to me that many linguistics still support Volga Bend as the place of origin. According to Narasimhan et al 2019 West Siberian Neolithic type of features were BTW present there, which in my understanding supports your own qpAdm model based on Estonian Tarand grave sample OLS10, including WSHG type of features, if I'm right. On the other hand, the officially unpublished Baltic Finnic ancient DNA samples which I've seen, don't at least in PCA seem to be based on anything East Siberian. BOO and Levänluhta are a different story,as there's a clear Arctic substratum based vibe in those Saami related samples. So, I can't see too much support for the East Siberian origin of Proto Uralic. Some place in/or around Ural area makes much more sense.
Turbino, now that Seima Turbino was mentioned, is a place by Kama river, located in the same area as Garino Bor culture. Udmurts still live there. Seima, on the other hand, is located by Oka river, probably the birthplace of West Uralic (Mordva i.e. Erzya and Moksha speakers, Baltic Finns, Saami). Seima type of axes were for instance apparently first found in Finland and only after discovery of the famous Seima burial the axe type was renamed in Finland as Seima axe.
@epoch, on the question ("I have read in several papers the case for a European homeland and one of the consistently put forth arguments is that PIE must have been in the neighbourhood to explain for the list of old cognates. Is this line of reasoning now obsolete?"), with the caveat that the specifics of the loans may overrule this:
There is always the possibility that any early IE-U loans took place via a third (or more layers of) intermediary language group that was wiped out by later expansions. To enable apparent "sharing" over longer distances than usually thought.
Direct IE-U is probably just the most parsimonious solution from modern we can have without postulating extra languages, rather than one we can be clear about from modern data (which is extremely lossy).
On a similar tack, when it comes to loans from II-U, there are some elaborate schemes devised to try and prove the presence of different layers of II interacting with U, through II loans in U. But there is little evidence of the reverse loaning U->II.
This seems to have been put down to different technology and motivations for loanwords and so on. However I think it is more likely that it is simply the case that loans would be bidirectional, but that whichever specific variety of II language(s) was/were interacting with U languages (and would have likely received loans) is simply extinct today and not known. There are essentially a wide variety of II languages which are thought to have existed in the past, which have no attested form today to examine for vocabulary / loans.
Probably replaced by Turkic languages, if not the U languages themselves, or II languages which did not receive loans themselves, before the expansion of Turkic languages. (This would have nothing much to do with the question of where the II languages ultimately originated, and would be compatible with a variety of possibilities, including the most commonly held one).
Rome paper is out: https://science.sciencemag.org/content/366/6466/708
Thanks claravallensis.
Mesolithic Italy dominated by I2a2; Neolithic R1b-P343, J2 and G2
Tarands are are easily traced back to the LBA-IA upper-mid Volga (D'yakovo and Akozino-Akmylovo respectively). Both of these cultures are decended from the Netted-Ware culture of the upper Volga. I would assume N1c will be found in BA Netted Ware.
Netted-Ware was it self formed after Fatyanovo-Balanovo and Abashevo elites migrated into Volosovo territory. Netted Ware was also involved in the spread of Seima-Turbino materials.
The way I see it this really only leaves a few options, Garino-Bor and/or Seima-Turbino or Circum-Polar explorers like BOO making their way into Netted-Ware groups like Carlos believes.
@ Rob
Isn't it R1b M343 ?
Yep; sorry
I also bought the paper here is an interesting Figure: https://i.imgur.com/CXIghxD.png
Davidski are we ok to share?
Wonder if they are relatives of the R1b V88 in Neolithic Spain....
This gets interesting, there will be hundreds of Kurganists looking for SNPs to prove it's V88
Grotta Continenza 5.200 BC
Samples R437 and R851 seems like they may be derived for R-L2 too.
There is only 1 Etruscan Y-DNA and it's J-M12
The iron age Latin y dna is:
R-M269
T-L208
R-P311
R-PF7589
R-P312
R-P312
Not so sure about Etruscans anymore
They found Varna man's Y-DNA in the Latins
The other two Iron Age males, R474 and R850, belong to J-M12 (J2b) and T-L208 (T1a1a) haplogroups
respectively. As discussed above, the J haplogroup and its J2a subclade have already been present in early
farmers in Italy, the Balkans, and Anatolia (13, 14). In addition, a Bronze Age individual from Croatia
(1631-1521 calBCE) belonged to the J2b2a haplogroup (14) and carried exactly the M314 derived allele
that is also found in R474. Therefore,the observed J-M12 (J2b) could be a surviving lineage from local
Neolithic populations or due to recent migrations from the Balkans or the Near East. The T1a haplogroup,
although absent in our samples prior to Iron Age, has previously been found in early farmers in Bulgaria
(5,800-5,400 calBCE) (14) and Germany (5,500-4,850 BCE)(13), so it is possible that it was also present
in early farmers in central Italy.
"Steppe-related." Goddammit. Seriously? They are going to be that ambiguous and vague about it?
They should know exactly where comes from. It comes from Descendants Bell Beaker from North of the Alps who carried R1b U152>L2+ which most of the Iron age samples from Latium belong to. They had been living in Northern Europe since 2800 BC, so over 1,000 years, before they entered migrated into Latium in Central Italy.
It's common ancestor between ancient Romans, Gauls, Britons, and Iberians. Which is interesting and significant if you think about it.
They say "Pointic-Caspien Steppe" not Eastern Europe, not Northern Europe. Which are the "Steppe-related" people who migrated into Italy came from.
They spoke of "Iran Neolithic", when they really should be speaking about Anatolia or NEar East. Because, there was no gene flow directly from Iran into Italy. Or directly from the "Pontic-Caspien Steppe" into Italy.
These bad descriptions give a inaccurate impression about who was moving into Italy at different times.
These researchers need to get a better understanding of Europe's population history as told by ancient DNA before they publish anything.
Sam
What they say is
“ qpAdm, we modeled the genetic shift by an introduction of ~30 to 40% ancestry from Bronze and Iron Age noma- dic populations from the Pontic-Caspian Steppe (table S15), similar to many Bronze Age popu- lations in Europe (10, 13, 14, 19, 22). The pre- sence of Steppe-related ancestry in Iron Age Italy could have happened through genetic ex- change with intermediary populations (5, 23). Additionally, multiple source populations could have contributed, simultaneously or subsequently, to the ancestry transition before Iron age”
Seems reasonable
Isn't it also quite interesting how that iron age sample from Croatia, I3313 I think, which seemed to resemble Bergamo so much, is consistent with forming a clade with these iron age Romans?
Frome the supplementary:
"The only well-fit one-way model is with an Iron Age individual from Croatia dated to 805-761 calBCE,
suggesting that this individual form a clade with Iron Age central Italians, with respect to all the
populations in the “right” set (ANC17). This result, together with those for Neolithic and Copper Age
individuals, points to tight connections between Italy and the Balkans from Neolithic to Iron Ages."
@Rob
Well there are only 4 Neolithic samples, one of which could be V88, 2 are J2 and 1 is G2. The V88 and G2 would be consistent with ancient Sardinians. Whatever happened to the alleged R1b and I1 "Etruscans"? I guess that wasn't true.
Imperial Rome
E 1/24
G 5/24
J 13/24
R1a-Z93 2/24
R1 1/24
R1b-m269 1/24
T 1/24
@claravallensis,
Iron age Italy has no ancestry from the Balkans. It's very clear cut. Most iron age Italians carry R1b L51 which comes from Bell Beaker. 99% of Bell Beaker in Germany, France, Czech carry R1b U152>L2 which is very common in Northern & Central Italy today. And apparently was the most common haplogroup in the Latin Tribes & Republican Romans.
Generally speaking, some pops in Iron age Western Balkans were similar to Iron age Italy. This was not because of a direct relationship. But, because each was a similar mix between Anatolian farmers, European Hunter gatherers, and Eastern European Pastorlists.
@ Sam
“Generally speaking, some pops in Iron age Western Balkans were similar to Iron age Italy. This was not because of a direct relationship. “
There were clear and direct movements between west Balkans and Italy during BA - IA
@AWood
Weren't those from the other study that should be coming? That with the leaked PCA.
Or maybe they sharing these samples, not sure.
@Gaska said..."This gets interesting, there will be hundreds of Kurganists looking for SNPs to prove it's V88"
And if history is any indicator, they will succeed. On the flip side, there will be hundreds of Iberians looking for SNPs to prove it's M269 and they will fail miserably.
Did R1b V88 migrate via Italy into North Africa during the Neolithic ?
@ Anthony and re: "The way I see it this really only leaves a few options, Garino-Bor and/or Seima-Turbino or Circum-Polar explorers like BOO making their way into Netted-Ware groups like Carlos believes."
This indeed seems to be the case.
Should we be using scaled coordinates in these two?
This is what the PCA tool says...
Paste Global25 scaled coordinates here.
The other one takes both scaled and original coordinates.
I've updated both tools.
Changes in Vahaduo 19.11:
"RUN ALL" button in the DISTANCE tab.
Switch from a pure Monte Carlo algorithm to a two-stage hybrid approach. Now it should handle some corner case models much better. As a side-effect...
Vahaduo is now nearly two times faster than before.
Finally - vox populi, vox Dei - I've added an option to change the cycles multiplier (1x-2x-4x-8x-16x).
Changes in Global 25 Views:
African PCA (ideas how to rename modern groups are welcomed).
Awesome, thanks.
@Vahaduo, thank you for the African additions -- much appreciated!
To avoid any confusion with more heavily Eurasian-admixed North-Sudanese pops, could the naming of "Sudanese" samples be changed to "South-Sudanese"?
@ Angoliga
Done!
@ Davidski
You're welcome! And I have another update:
Changes in Vahaduo 19.11.1:
ADD DIST COL option. Adds distance to target multiplied by 0.25, 0.5, 1 or 2 as a column in source. Skews results towards sources closer to target at a cost of higher overall distance.
Generally I would not recommend this feature for the modelling sensu stricto (as it shifts the result from "who my real parents are" towards "to whom I am closer"), but it can be a nice source of ideas for proximate models or it can work as a some kind of hybrid of modelling and checking distances used to check the neighbourhood of a given sample.
@vahaduo
I made some typos in the African PCA datasheet.
All of the ancients marked with the TZN prefix should be marked TZA, which is the correct country code for Tanzania of course.
If you can fix that at some stage that would be great.
@ Davidski
Fixed! Plus an update:
Changes in the updated Global 25 Views tool:
- margins reduced to a minimum - bigger plot area (20% on my small screen),
- "Save" menu for downloading graph images - multiple sizes with different aspect ratios,
- custom colour scheme with toned down colours of the background PCA - better visibility of projected samples,
- projected sample markers cycle trough 11 colours and 4 shapes - 44 distinct markers (previously 10),
- projected sample markers are coloured independently of the background PCA - consistency across different PCAs,
- projection of multiple samples at once.
Let me know if you find any bugs.
@vahaduo
Nice. The new PNG files look great.
Btw, would it be possible to combine the West Eurasian and East Asian datasheets to create a Eurasian PCA?
Actually, here's a North Eurasian datasheet...
G25_North_Eurasia_scaled
A couple of different views would be nice. Maybe 1&2 and 1&4?
Hopefully, I didn't include any duplicates or missed too many relevant pops.
@ Davidski
North Eurasian PCA (1: 1&2, 2: 1&4, 3: 1&5) is already online.
BTW shouldn't "Nivh" be "Nivkh"?
Combining the datasheets should work fine, although obtaining clear views may be tricky. Later today I'll post something that may be helpful in such case.
Thanks. Looks great.
And yes, it should be Nivkh. That typo came with the original dataset.
@ Davidski
I've updated g25v few minutes ago with some hidden feature.
Type join@ at the beginning and all points will be added under one label - population name of the first sample.
join@Yamnaya_RUS_Samara:I0357,0.126344,0.092413 (...) = Yamnaya_RUS_Samara
You can also specify a custom label by typing join@some_name;
join@Steppe;Yamnaya_RUS_Samara:I0357,0.126344,0.092413 (...) = Steppe
Yep, nice one!
New tool for reprocessing PCA data:
https://vahaduo.github.io/custompca/
SOURCE
Samples placed here will define the PCA space.
PROJECTED
Samples placed here will be projected onto that space.
PCA DATA
Result of the PCA will be automatically placed here. You can copy and reuse this data in e.g. PAST. You can also paste coordinates into this tab directly if you want just to make a plot without running any PCA.
When it comes to the number of dimensions there is a default 98% cut-off of explained variance. Depending on the data it may be 2-4 dimensions or e.g. 14.
PCA PLOT
-PROJECT SOURCES - YES/NO
-RUN PCA
-PLOT PCA
Self-explanatory.
Use +- to switch dimensions.
FLIP ✔ to mirror given dimension.
To apply changes click PLOT PCA again.
First part of the name (first, first:second, first:second:third) will be used to group samples under one label.
If you want too pull a single sample out of the group change the first part of the name or e.g. remove/replace the divider (first:second -> first-second).
If you want to put this sample at the top of the legend add a dot at the beginning of the name (first:second -> .first:second).
If you want to put it at the bottom add an underscore (first:second -> _first:second). Helpful when you want to add a sample without reshuffling the markers of the already plotted ones.
Taking screenshots is currently not implemented in a menu (I need to first think how to squeeze this and many other options into the interface), but you still can take screenshots (same applies to g25views) using web developer tools in your browser. Open them, go to the console and paste:
Plotly.downloadImage("graphDiv", {format: "png", width: 1600, height: 1200})
Available image formats: jpeg, png, svg or webp.
Dude, you're on fire!
Very nice update Vahaduo.
@vahaduo
I've updated some of the PCA datasheets with a lot of new samples, using the same pop codes as in the Global 25 Views plots.
They may or may not look better online than the old ones. If they do, it might be worth updating them at some point if it doesn't take too long.
G25_West_Eurasia_scaled
G25_Europe_scaled
G25_North_Euro_scaled
G25_South_Asia_scaled
@All
Whoops, just edited the new North Euro datasheet. Had to get rid of some obvious outliers.
@ Davidski
Updated PCAs are now online. I've built a code generator based on the Custom PCA tool, so now I'm able to add/update any PCA within few minutes.
Additionally I've added hash-based routing so it's possible now to link directly to a PCA of your choice:
https://vahaduo.github.io/g25views/#WestEurasia
As a side-effect navigation between PCAs is now registered by the browser history, so you can use "back" and "forward" buttons to quickly switch back and forth between recently used PCAs.
____
If you want to put it at the bottom add an underscore (first:second -> _first:second). Helpful when you want to add a sample without reshuffling the markers of the already plotted ones.
This tip doesn't work anymore. When Matt posted his plots the first thing I saw was the unsorted legend. Sorting is a part of the label aggregation algorithm so it was really bad (for Matt it worked by accident as he was pasting contiguous blocks of alphabetically sorted samples). I figured out that Matt probably uses Microsoft Edge, which treats one instruction differently than other browsers and I had to change the way how the sorting is done. It now relies on the language settings of the browser, so at least for me the underscore lands at the beginning of the list.
@ Matt
Very nice update Vahaduo.
Thanks!
@vahaduo
Thanks man, they look awesome.
Post a Comment