Thursday, April 4, 2019

Downloadable genotypes of present-day and ancient DNA data

They're freely available via the Harvard University at this LINK. The linked web page includes this message:

We would be grateful if users of this dataset could alert us to any errors they detect and help us to fill in missing data. This could include: (1) errors or missing information for location, latitude, longitude, archaeological context, date, and group label, (2) concerns about Y chromosome or mitochondrial DNA haplogroup determinations, and (3) evidence for other problems in the data or annotations for individuals. Please write to Swapan 'Shop' Mallick and David Reich with any suggestions. We would also be grateful if members of the community could suggest additional content that would be helpful to add to this page to make it maximally useful. Finally, please let us know if there is any ancient DNA data we should be including that we have missed.

By the way, I've updated my Global25 datasheets with many of the samples from this new Harvard release. Same links as always...

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient


Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

Dave the Slothtopus said...

Is 6Drif23 still the only DF19 in ancient samples, so far?

Guy said...

Hi Folks,

Could someone familiar with the innermost details explain why there is dearth of aDNA samples from France and Italy? I assume it has something to do with national law and public policy?


Michalis Moriopoulos said...

Very nice to have this.

Dave, there are probably some samples included in this repository that would be useful to convert to G25 coordinates. I'm thinking in particular of that McColl Southeast Asia paper with the Hoabinhians (might be useful for modelling AASI, for instance). I'm sure there are some others, too, but I don't know what you can and can't use.

Dave the Slothtopus said...

To answer my own question (I learned the anno file can be opened in Excel), there are only 3 DF19s listed, they are all modern samples (one DF19*, one DF88 and one Z302 (these are the immediate subclades of DF19, and the "grandsons" of P312), and 6DRIF23 is listed only as "R1b1a1a2a1a" where DF19 begins with at R1b1a1a2a1a2e. So if you didn't already know he was a DF19, you'd never find him with this...

Matt said...

Sanganji Jomon up in this file :). Sherpas are also a nice little addition and ancient Nepal samples. Together they will enhance coverage of within East Asia substructure. Disappointed there are no Naga / Ao_Naga.

There are a lot of little additions (probably many more than I see at first glance), but how spoiled am I (by adna and Davidski's dataset) that I'm looking through the sample IDs and being like "Is that it?"?

Romulus the I2a L233+ Proto Balto-Slav, layer of Corded Ware Women said...

39690-35630 calBCE Oase Cave Romania

Interesting that this 12% Neanderthal guy belonged to Y Haplogroup N1c1a, was that determined when the paper on it came out? I can't remember that.

N1c1a corresponds to SNP M178, which the Y-FULL tree has a Formed Date of 16000 ybp. Obviously Y-Full is way way way off here if the Y haplogroup is indeed correct.

Romulus the I2a L233+ Proto Balto-Slav, layer of Corded Ware Women said...

Also that Near Eastern like Roman Gladiator belonged to J2b, missed that one in the J2b post. Very strongly believe J2b brought Etruscan now.

Romulus the I2a L233+ Proto Balto-Slav, layer of Corded Ware Women said...

I2a2a1b is a really interesting group to look at:

Date: One of two formats. Group ID Y chrom. (automatically called only if >50000 autosomal SNPs hit)
6570-6255 calBCE Serbia_Iron_Gates_HG I2a2a1b2
6361-6050 calBCE Serbia_Iron_Gates_HG I2a2a1b
6355-5990 calBCE Serbia_Iron_Gates_HG I2a2a1b2
6200-5900 BCE Serbia_Iron_Gates_HG_brother.I4880 I2a2a1b
6061-5990 calBCE Latvia_HG I2a2a1b
6000-5725 calBCE Serbia_Iron_Gates_HG I2a2a1b2
5636-5521 calBCE Ukraine_N I2a2a1b1
5473-5326 calBCE Ukraine_N I2a2a1b1b
5460-5218 calBCE Ukraine_N I2a2a1b1
5291-5060 calBCE Ukraine_N I2a2a1b
5300-4900 BCE Hungary_ALPc_Tiszadob_MN I2a2a1b1
5208-4942 calBCE Hungary_ALPc_Tiszadob_MN I2a2a1b1
5209-4912 calBCE Hungary_ALPc_Tiszadob_MN I2a2a1b
4837-4713 calBCE Latvia_HG I2a2a1b
4519-4343 calBCE Ukraine_N_son.I1732.SG I2a2a1b1
4519-4343 calBCE Ukraine_N_son.I1732 I2a2a1b1
4444-4257 calBCE Hungary_Tiszapolgar_Bodrogkeresztur_ECHA I2a2a1b
3900-3600 BCE Iberia_MN I2a2a1b2
3761-3638 calBCE Iberia_LN.SG I2a2a1b2
3500-3360 calBCE Scotland_N I2a2a1b
3338-3025 calBCE Bulgaria_EBA_published I2a2a1b
3328-3015 calBCE Bulgaria_EBA I2a2a1b1
3400-2800 BCE Poland_Globular_Amphora I2a2a1b
3020-2895 calBCE Bulgaria_EBA I2a2a1b1b
3012-2900 calBCE Bulgaria_Yamnaya_o I2a2a1b1b
2899-2706 calBCE Ukraine_Globular_Amphora I2a2a1b
2890-2694 calBCE Ukraine_Globular_Amphora I2a2a1b2
2870-2575 calBCE Poland_Globular_Amphora I2a2a1b2
2900-2300 BCE Iberia_C I2a2a1b2
2849-2143 calBCE Russia_Yamnaya_Kalmykia.SG I2a2a1b1b2
422-541 calCE Hungary_Langobard_son.SZ24_father.SZ7_brother.SZ22 I2a2a1b2a2a2
412-604 CE Hungary_Langobard_son.SZ24_brother.SZ22 I2a2a1b2a2a2
412-604 CE Hungary_Langobard_brother.SZ14_brother.SZ8 I2a2a1b2a2a2
412-604 CE Hungary_Langobard I2a2a1b2a2a2
412-604 CE Hungary_Langobard.SG I2a2a1b2a2
580-630 CE Italy_Langobard I2a2a1b2a2a2

old europe said...


check out also the presence of I2a2 in the eneolithic Khvalynsk culture. This is a translation of what they found there in the first quarter of the IVth millennium BC:

" In addition to the uraloid substrate, the European broad-faced and southern European variants are recorded. R1a1, O1a1, I2a2 are added to the mito T2a1b, H2a1 by the haplogroups."

Davidski said...


So far I've added or updated the Global25 datasheets with these samples from the new Harvard data. What else should I add? Keep in mind though that not all of the samples have enough coverage to be run successfully.

The full Global25 datasheets are available at the usual links...

capra internetensis said...

Great new stuff Davidski, thank you! I'm going to have a look at some of those ancient Americans.

Drago said...

Do you have any of the North African neolithics in the G25? iAM etc ?

Drago said...

Good obs. A clear demonstration of movement from the Danube toward the east/ steppe. Explains how some proto-kurgan features begin to appear during the Mariupol horizon .

Arza said...

@ Davidski

Thanks a lot!

Some observations:
- coordinates of samples that were already present in G25 have changed a little bit,
- Poland_BKG:N22 is like Iron Gates,
- Sweden_Motala_HG:I0017 is a huge outlier (43% SHG, 57% Iberia_NW_Meso, Dist. 9.6%)
- Sweden_Viking_Age_Sigtuna:vik_kal009 clusters with Baltic_BA/Latvia_BA,
- Sweden_Viking_Age_Sigtuna:vik_KAL006 is close to Karelians.

Davidski said...


Yep, thanks, there might be issues with the coordinates for some of those samples. I need to have a closer look at them.



Samuel Andrews said...

According to the Paste app Davidski shared earlier this year.

Poland Neolithic farmers cluster with Hungary/Danube farmers. Except....Globular Amphora who clusters with Sweden Funnel Beaker & earlier published Globular Amphora samples from Poland & Ukraine. Andronovo, Corded Ware, Rhine Bell Beaker's farmer ancestry is akin to TRB/Globular.

Samuel Andrews said...

Danube/Hungary, Bulgaria, British/Iberian, GAC/TRB farmers can all be distinguished from each other.

Samuel Andrews said...

Poland Funnel Beaker not (directly) related to Sweden Funnel Beaker. It mostly descends from earlier Poland farmers. German Funnel Beaker is also not very (directly) related to Sweden Funnel Beaker. But, Globular Amphora Poland/Ukraine is.

Samuel Andrews said...

Nevermind. Poland Funnel Beaker, is a Globular & HungaryEF-like mix......Arrival of Funnel Beaker culture can be associated with this new geneflow.







Samuel Andrews said...

All Neolithic farmers were closely very related. But, Globular Amphora, Funnel beaker are another example of pots=people. The admixture could have been very complex. But, nonetheless, the spread of the new cultures involved the spread of new people.

old europe said...


can you provide some papers on the Mariupol/kurgan subject?


All these farmers stem from 8 or are strictly connected with) the farmers that gave birth to the megalithic culture. They were from Brittany/Paris basin. Probably they were a mix of cardial, LBK and local hunter gathers.

You can check it on Chad's blog

FrankN said...

Sam: TRB preceded GAC - the models are anachronistic.

Davidski said...

@old europe

Don't get too excited.

The Khvalynsk paper hasn't been published yet, and the Y calls were obviously generated with PCR, so they're not totally reliable. The O1a1 result looks like an error.

Also, some subclades of I2a2 are native to the steppe and didn't arrive there with any farmers or during the Copper Age.

The I2a2a in Yamnaya is actually I2a2a1b1b, which has been found in a steppe hunter-gatherer with zero farmer ancestry, and an eastern Yamnaya individual with minimal farmer ancestry. These two...

Ukraine_Neolithic I1738

Yamnaya_Kalmykia RISE552

That RISE552 sample is one of the Yamnaya individuals with the lowest levels of western farmer ancestry.

Samuel Andrews said...

@oldeurope, Yes, that makes sense. People have suspeced a west-east movement with Megaliths for a long time. Dual Archaeology & genetic-minded can answer the question. Might not be for 10 years, till people like that are in writing papers.

Samuel Andrews said...

The Scythian data demands people with archaeology, historical knowledge. Moldvaa samples are mostly of native Balkan/southeast European origin. One Scythian from Ukraine is basically a Slav (identical to modern Ukrainians).

Scythians from Hungary are mostly central-European. Scythians from Russia, mostly local origin. Most Scythian samples have minor Asian Steppe admix but all are of mostly local origin. But even Central Asian Scythians aren't uniform but instead look like non-Scythian locals (who themselves were a Andronovo/European, Asian, ANE, BMAC/Iranian.....mix so it's impossible to keep track of what's going).

Scythian elite, native serfs?

Drago said...

@ Old Europe/ Davidski

Autosomal composition can rapidly change within 2 -3 generations. Thus far, I2a2a1b is doesn;t appear in Ukraine prior the Mariupol horizon. ANd one of them indeed has solid EEF ancestry. The same lineage is also found in ALPc, with furhter resolving of Lipson's data.

@ Davidski
Why did you label Poland BKG_01 an outlier ? Seems to fit right amidst the other BKGs..

Drago said...

@ Sam
These Polish neolithic data are interesting. However, I would not model TRB on GAC, because GAC (33-2200 BC) the latter is slightly younger although partially contemporaneous to TRB (4000 - 2400 BC). These groups are both highly similar in WHG/ EEF proportions. The differences lie in Y -Hgs

Davidski said...


That Polish BKG sample is marked as an outlier in the Harvard data. I haven't had a chance to look at that yet.

By the way, so you're claiming that I2a2a1b1b has been recorded in ALPc? Which sample?

Drago said...

Tiszadob-Ó-Kenéz ALPc 5000 BC, I2a2a1b1 x 2
Hejőkürt-Lidl logisztikai központ ALPC 5000 BC, I2a2a1b
Törökszentmiklós, road 4, site 3; 4500 BC Tiszapolgar I2a1b1b1

All these samples from ALPc to Mariupol derive from one common ancestor, within variability of coverage & the fact that different labs sampled ALPc (Lipson) & Ukraine (Mathieson).
Imaginably more data are on the way too, from missing phases & regions.

Davidski said...


Blah, blah.

Find me an instance of I2a2a1b1b outside of the steppe without steppe ancestry.

I just want to see one. Thanks in advance.

Drago said...

Well Davidski ; after the Iberia paper, it’s seems you love surprises.
Meanwhile, you’re ignoring I3719 ?

old europe said...


They were likely metal prospectors from the Balkans. We have found the origin of the old legends and tales in the PIE languages.

Samuel Andrews said...

Ancient DNA is progressing fast again. We have good representatives of some groups from antiquity....

Greeks, Egyptians, Cannanites, Hallstatt Celt, Breton, CeltIberians, Iberian, Thracian (BalkanIA), Dacian (in Scythian burial), Scythians, Sarmatians, early Slav (in Scythian burial).

My, bet, is Romans (and all Latins) clustered in southern Italy.

old europe said...

edit: old legends and tales about metallurgy I mean. Like the devil and the smith stuff

Davidski said...


I'm not ignoring I3719. But, alas, there's just no evidence that this sample belongs to the steppe specific I2a2a1b1b subclade.

And the sample from Ukraine that does is older than I3719 and clearly native to the steppe.

Ukraine_Neolithic I1738 5473-5326 calBCE

So go right ahead and surprise me by actually proving that I2a2a1b1b isn't native to the steppe and that it comes from the Balkans or the Carpathian Basin.

Take your time. I can wait.

Samuel Andrews said...

England_Roman_o almost identical to Egyptain_New Kingdom. He was probably an Egyptian gladiator in Roman Britain.

Davidski said...


That Ulan IV sample from 2400 BC Davidski often mentions obviously represents a movement eastward.

Correction. A movement westward that also spread steppe ancestry. It's all here...

Ukraine_Neolithic I1738 I2a2a1b1b 5473-5326 calBCE

Yamnaya_Bulgaria Bul4 I2a2a1b1b 3012-2900 calBCE

Balkans_BronzeAge I2165 I2a2a1b1b 3020-2895 calBCE

Davidski said...


I've added some more samples...

Full Global25 datasheets...

Matt said...

Having a quick look through the new G25 datasheets, by reprocessing through PCA:

Does look like there are lots more North African samples in G25 now from this set, and many more Native Americans in particular.

The structure, which has been suggested, where Sherpa and ancient Nepalese are at an extreme in a pole distinguishing East Asians from Siberians and SE Asians does seem apparent here (hopefully Ryukendo will be quite pleased to see this confirmed). The Jomon also seem to form a pole in an opposing high dimensions, while being para-East Asian in low dimension.

Sherpa seem pretty close to main East Asian culture in low dimension, though maybe they could have some AASI or ANE to them.

Tripuri, Tharus, Jamatia, Kusunda, Burmese and Naxi all break towards the Sherpa in the high dimension where Sherpa are most distinguished, so there's a nice "high altitude / Himalayan East Asian" in the reprocessed G25 (while Nivkhs, Japs, Ulchi and Korean break towards Jomon), though it maybe it could be distinguished better in a new PCA based directly on these data.

Matt said...

Juan, yes, thanks for mentioning that paper again. I think the number they put on their suggestion of "Non-modern human sequences compose ∼6% of the Tibetan gene pool" seems not right, and they don't have any evidence for it in f3 statistics either, but there was surely introgression of some kind from homo who were adapted to higher altitudes.

Ric Hern said...

@ Matt

Is that adaption to high altitude genes also visible in Papua New Guinea ?

Matt said...

@ric, no, I don't believe anyone has found that there is any suggestion that this is the case.

But Denisovan related populations that introgressed appear to be were highly structured (see - / / / And there's really no selective pressure for the variant would introgress in PND either.

Ric Hern said...

@ Matt


FrankN said...

Matt (and anybody else):
How does Central Asia CA (Saraszm etc.), possibly also Steppe Maykop, relate to Sherpa and/or Jomon?

UP/Mesolithic flint knapping in EC Asia followed not the Siberian (AG3) "bullet-shaped" method, but the Jomon/East Asian Yubetsu method.
Possibly already by the Neolithic, but certainly during the CA, the Silk Road was established, linking Central Asia via Tarim Basin and Tibet to China.

Alternatively: Is there any PCA dimension suggesting a Central Asian pole? The Pamir may have served as LGM Refugium and created its specific drift.

Leron said...


On Anthrogenica I remember reading a thread on a Jomon genome study where someone mentioned Jomon being part ENA and part another divergent population that predated the arrival of ANE. Explaining how Jomon are slightly closer to Europeans than to modern East Asians.

Davidski said...


On Anthrogenica I remember reading a thread on a Jomon genome study where someone mentioned Jomon being part ENA and part another divergent population that predated the arrival of ANE. Explaining how Jomon are slightly closer to Europeans than to modern East Asians.

This statement doesn't make any sense.

Matt said...

Few Neighbour Joining Trees:

First is just using the data from latest data sheet and population averages, other two graphics I'd merged in some samples that the last previous update that seemed to have gone missing along the way, basically because I noticed the Myanmar LNBA Oakaie (which here sits on a clade with recent Burmese) had gone missing along the way and poss a few others.

Central East Asia (South of Mongolia, North of Vietnam) seems to have a pretty clear primary split between Tibet and Himalayan and other East Asians, Jomon falls basal to the Central East Asia + South East Asian post-Neolithic clade (but closer than Hoabinhian+Onge group).

Not much new insight into Europe - I'd note Poland BKG falls as clade with GAC and Swedish MN/TRB, while Polish TRB falls with Germany_MN, though not sure how significant that is.

Some nice structure in North Africa as well now.

Samuel Andrews said...

@Matt, Cool. Can't wait to jump into East Asian genetics. Would you say, Jomon, "Central East Asia", "South East Asian" all descend from a homogeneous single lineage?

Past3 tree I did with late Neolithic Europeans, put Poland_BKG with Hungary farmers & Poland_GAC with already published Globular Amphora (especially ones from Ukraine). IMO, there's little continuation between BKG and Globular.

Simon_W said...

@ Romulus

"Also that Near Eastern like Roman Gladiator belonged to J2b, missed that one in the J2b post. Very strongly believe J2b brought Etruscan now."

The England_Roman_o has absolutely nothing to do with Etruscans. Genetically he's very close to Levant_BA_South and the ancient Egyptians. So he may have been a Nabataean or an Egyptian. Isotopic data suggests he grew up in a dry climate, so not in the Nile delta I suppose, but rather in the desert.

I think the Etruscans may go back to the Terramare culture of Northern Italy. Some archaeologists have theorized that the Terramare culture is derived from a population wave from Hungary. Most of all because they had adopted the cremation custom. Others however prefer to derive the Terramare culture from the Polada culture of Northern Italy. That makes more sense to me, considering they were pile dwellers, see this reconstruction of their houses:

The pile dwelling tradition is circum-Alpine and goes back to the pre-IE Neolithic.
Moreover, the Terramare culture didn't have a lot of horses, and they didn't mark high status individuals in their burial grounds, so I doubt they were IE. Yet they had some influence on the Protovillanovan of central Italy.

Simon_W said...

@ Samuel Andrews

"My, bet, is Romans (and all Latins) clustered in southern Italy."

Nice bet. I'll bet against this then. My guess is that the Latins and the original Romans clustered in Northern Italy. Two reasons make me think so:

- We still don't know when exactly these 3 or so South Italian-like individuals from 700-20 BC Lazio lived. If they lived between 700 and 350 BC, that would be something! But by 300 BC Rome had already annexed parts of Campania and by 255 BC Southern Campania was added as well. The Campanians may have been South Italian-like from the beginning. In 241 BC Sicily was annexed, and by 146 BC mainland Greece. Western Anatolia followed by 129 BC. From all these places slaves were taken to Rome, and South Italians who had become Roman citizens served in the Roman legions. Although this era of Roman history is called the "Republican Age", and not yet the "Imperial Age", they already has a considerable empire during the mature republic.

- Another reason: Attempts to determine the date of admixture via linkage disequilibrum have repeatedly suggested that the admixture with strong West Asian components is rather recent, probably going back to the Imperial Age. For instance, Busby et al. 2015 dated this admixture in Tuscans and in South Italians to the first half of the first Millennium AD. And Raveane et al. in their 2019 preprint dated it via Globetrotter also to the Imperial Age, in Central and Northern Italians, and even later in the South, but that's due to the Arabic invasion in the South which has overlayed older accretions. So even though South Italian-like people did occur in Latium during the Republican Age, the bulk f this admixture seems to go back to the Imperial Age, which is in fact what the leaked aDNA data also showed, with the long tail of migrants going all the way to Syrians and Iraqi Jews.

Drago said...

@ SimonW

“Moreover, the Terramare culture didn't have a lot of horses, and they didn't mark high status individuals in their burial grounds, so I doubt they were IE. Yet they had some influence on the Protovillanovan of central Italy.”

That’s a rather simplistic perspective , no ?
Which horse culture would you have to find your real Italics ? (Kinda hard to follow your theories as your constantly “updating” your theories as the aDNA comes through :))

FrankN said...

Matt: Thanks for the Neighbour-joining tree. I find especially intriguing that Brazilian Botocudo cluster with Malayo-Polynesians (French Polynesia 150BP, Batak, Agta, Aeta etc.).
Also interesting, but somewhat expectable, is HajiFiruz_BA at the root of one of the Central Asian/ South Asian sub-trees (Kalash, Tajik, Sindhi, Gonur2_BA etc.).
Also noteworthy: Varna_o clustering amidst modern Czech, East Germans, Austrians, Hungarians and Balkans pops. Just an artefact?

Samuel Andrews said...

@Simon_W, You're probably right. It is important, North Italian & South Italian-like pops lived in Imperial Rome. Possibly two different ethnic groups. I guess the North Italian one were probably Latins.

Arza said...

@ FrankN

It's not an artefact of clustering.

G25, full spreadsheet:

Polish:Polish3 27.8%
German:German6 12.8%
TDLN:I1899 11%
Dutch:Netherlands59 9.4%
Latvia_BA:Kivutkalns215 8.6%
Polish:Polish23 6.4%
Greek_Trabzon:G25003 5.6%
German:German31 4.6%
Afanasievo:I6711 3.6%
Dutch:Netherlands29 3.4%
Iberia_Northeast_c.6CE_PL:I12032 2.6%
Latvia_BA:Kivutkalns194 2%
Barcin_N:I1096 1.4%
Levant_ChL:I1164 0.4%
Catacomb:RK4002 0.2%
Tisza_LN:I2358 0.2%

Distance 1.1890%

This sample was radiocarbon dated. TWICE.

ANI163 ANI163 VAR158 tooth .. .. 1240K MathiesonNature2018 6577 4711-4542 calBCE [4711-4550 calBCE (5787±30 BP, OxA-13688), 4667-4542 calBCE (5755±24 BP, OxA-13688)] Bulgaria_Varna_EN3 Varna Bulgaria 43.2131 27.8644 F H7a1 .. .. 0.598 410005 half All PASS .. .. .. .. .. .. 2018

Matt said...

@Sam, re: East Asian groups, from what I remember it looks like they all form a clade wrt to West Eurasians (past and present?), but hard to say if they form a single lineage as depends on what we mean by that. Early SE Asian mainland groups def. have ancestry from Hoabinhian populations in SE Asia - I would have a glance over McColl et al's paper and you'll probably be more or less about as up to speed as I am.

Re: Poland_BKG, one thing to note is that when I computed that average (I know Davidski does datasheets with averages, but I find it pretty quick just to calculate them myself from the individual sample sheets), I included sample N22, who is basically a Villabruna/WHG group sample and probably is best to treat as an outlier and who I should have probably removed before computing the average.

Including him/her probably raises the Poland_BKG WHG level by 13-15% (since there are 5 other "main" samples), and probably is what's pushed the Poland_BKG average near to GAC/TRB.

Revising to use only the "main" samples indeed places Poland_BKG with the Hungarian_LNCA (particularly) and Czech_MN / Czech_EBA_o / Balkans bunch:

So I'd say you're right there.

@Frank_N: Yeah, the Botocudo samples were subject of this - No one really seems to know how those people got there of course.

Re; Varna_o, it seems to be that the sample presents both pretty similar ancestral proportions ("Steppe"/WHG/EEF type proportions) as those groups and participates in whatever drift G25 picks up that distinguishes Eastern from Western European ancestry. There are a few of these old samples that seem to do this, including some from the Beaker sets in Hungary and others. Honestly, I'm not sure why this is the case or whether it means anything - Arza may have some ideas about it.

Drago said...

What is being demonstrated there in your model of Varna _O ?

FrankN said...

@Matt, Sam: It seems that the Polish_Lengyel sample has disappeared from the G25 data set. That one should have been ancestral to BCG (possibly with a bit of Rössen ancestry-the oldest of the Germany_MN samples).

Lengyel originated in Hungary, from where it expanded in all sorts of directions. It is credited with having introduced copper metalurgy to Central Europe. Northwards, the Gatersleben Culture around the Harz (Quedlinburg area), and Jordanow in Bohemia/ W. Silesia are archeologically qualified as Epi-Lengyel. Towards the West, Mondsee is seen as Lengyel-derived. Remedello's halberd tradition seems to go back to Lengyel / Bödrogkeresztur.

In spite of all their halberds, Lengyel/ Bödrogkeresztur were ultimately in the Pannonian basin replaced by the Baden/ Boleraz expansion, and migrated northwards into E. Silesia / Poland to form the so-called "Polgar cycle" from which ultimately BCG derived.
In short: Finding BCG related to Pannonians confirms archeological hypotheses.

Matt: "No one really seems to know how those [Botocudo]people got there of course." Yeah, I remember that article.
One possibility I see that they may relate to the events described here, especially as Botocudos cluster with Phillipinean negritos:
"The centre of origin of coconut extends from Southwest Asia to Melanesia. Nevertheless, its pre-Columbian existence on the Pacific coast of America is attested. This raises questions about how, when and from where coconut reached America. Our molecular marker study relates the pre-Columbian coconuts to coconuts from the Philippines rather than to those of any other Pacific region, especially Polynesia. Such an origin rules out the possibility of natural dissemination by the sea currents. Our findings corroborate the interpretation of a complex of artefacts found in the Bahía de Caraquez (Ecuador) as related to South-East Asian cultures. Coconut thus appears to have been brought by Austronesian seafarers from the Philippines to Ecuador about 2,250years BP."

Drago said...

@ FrankN

'' Finding BCG related to Pannonians confirms archeological hypotheses.''

We'd need to be careful, however. These clustering algorithms seem to work on the basis of similar ratios of WHG/ ANF/ CHG etc; not strictly recent drift, or direct cultural derivation, although it does tend to associate related samples (e.g. Hungary N, Starcevo, etc) - again - all on the basis of similar ratios.

Note the odd cluster of Gepid-Serbia, Roma & Kostenki.

Davidski said...


There are a couple of errors in the original dataset pertaining to the Poland_BKG samples. One sample is wrongly marked as an outlier, while another sample is wrongly not marked as an outlier. You'll find these corrected IDs in the latest Global25 datasheets.


Also, later today I'm going to update the coordinates for Tianyuan. The new sample looks a lot less noisy than the old one that I got ages ago.

Michalis Moriopoulos said...

Thank you for all your hard work. This is an impressive update.

FrankN said...

Dragos: Your point is taken, and I think it may well apply to the Varna_o case. As nice as it would have been to find a 5th mBC pre-proto South Slav - as we know about lots of later migrations into and through the Balkans, the chance of such a degree of genetic continuity is IMO close to zero.

Matt said...

@FrankN: I would note these trees are hard to read sometimes, but just to clarify there is not really an analogy between Gepid+Roma+Upper Paleolithic as between Varna_o and the samples that are near on the tree. The branch lengths on the tree imply that Gepid_Serbia, Roma, Upper Paleolithic are separated on a very high order. Only branching together because these are "nearest neighbour" but very distant nearest neighbour, albeit with compression against reality for UP Paleolithic both because they are much more weakly represented in the combination of ancient+modern dna this is based on and likely have not accumulated so much drift.

On the other hand, Varna_o is just in an absolute sense very close to those SE European populations; pretty short branch.

Simon_W said...

@ Dragos

"That’s a rather simplistic perspective , no ?
Which horse culture would you have to find your real Italics ? (Kinda hard to follow your theories as your constantly “updating” your theories as the aDNA comes through :))"

It's indeed more complicated, as even though the Terramare people didn't keep a lot of horses they seem to have had a horse cult, evidenced by an abundance of horse figurines. And I'll soon post something more that made me change my mind again. True, I often change my mind, not just because of new aDNA evidence, but rather because so many alternatives seem possible due to the limited nature of the evidence, and as I'm not pushing a career in these things I don't feel urged to stick to one theory.

Simon_W said...

I did some Global25/nMonte runs with the updated dataset. Always with nMonte 1.0 and scaled coordinates.

First of all, the progression from Poland_TRB via Poland_GAC to CWC_Poland nicely shows how the Steppe admixture came in with the Corded Ware. There's just a tiny bit of Steppe admixture already in the GAC. Also striking how much the ratio WHG:Barcin increased along this sequence.

[1] "distance%=2.5598"



[1] "distance%=3.925"



[1] "distance%=2.755"



Simon_W said...

Now for some Beakers and Iberians; Beaker_The_Netherlands has more Steppe than Beaker_Hungary, as David already pointed out:

[1] "distance%=3.2128"



[1] "distance%=2.1467"



But Bronze Age Southeastern Iberians had much less Steppe than this:

[1] "distance%=4.4748"



No wonder they didn't speak IE!

Modern Basques have more Steppe, but still a comparatively modest amount:

[1] "distance%=4.9648"



Basques have even a bit more Steppe than Bronze Age Northern Iberians:

[1] "distance%=3.7487"



Simon_W said...

Now I wanted to check if there's some none-CWC/non-Beaker admixture in the Hallstatt "Celts", or at least in the eastern Hallstatt culture which may have had a more Italic-related language. And indeed there seems to be a lot:

[1] "distance%=3.8147"



I only tested DA111, because the other one has Scythian-related admixture which probably wasn't that typical in Iron Age central Europe, so I consider DA111 to be more typical for Hallstatt and probably La Tène people. And voilà, about half of his ancestry is from Bronze Age Hungary, rather than from CWC and Beaker locals!

For comparison I did the same with Halberstadt_LBA, and he completely lacks this admixture:

[1] "distance%=2.0847"



So the question arises: Did this admixture from Bronze Age Hungary spread to central Europe after the LBA? Or, alternatively, did it spread to Bronze AGe southern central Europe, to the Tumulus and Urnfield culture in the narrow sense, while leaving the northern fringe unaffected? I consider this latter interpretation much more likely, because Halberstadt_LBA lived North of the Harz mountains and his culture was more associated with the Lusatian culture than with the Urnfield culture in the narrow sense.

But now let's check the if this admixture from Bronze Age Hungary can be found in the modern Irish, where Gaelic once was commonly spoken. Indeed it's there!

[1] "distance%=2.1319"



Davidski said...


No wonder they didn't speak IE!

You can't correlate IE speech with Yamnaya admixture in ancient Iberia, because the people who crashed into Iberia during the Copper Age didn't come straight from the steppe.

Try modeling Bronze Age Iberians as Dutch, French and Swiss Beakers, because whatever language their Beaker ancestors spoke came from these groups, not from Yamnaya.

Simon_W said...

I also tried to model the Tuscans. I first used Bavarian Beakers as a source of their Italic ancestry. But this resulted in terribly overfitted models, which moreover didn't make much use of Hungary_BA and Maros. But if the Italics arrived during the LBA from Hungary, Austria or Southern Germany, then a model that only uses Bavarian Beakers without strong admixture from Bronze Age Hungary doesn't make sense. So I got rid of those Beakers, and now the model is in perfect agreement with my above findings:

[1] "distance%=2.3973"



Lots of Hungary_BA and Maros, strong Anatolia_MLBA and only minor other accretions!

I don't want to be jumping to conclusions, but it seems like Hungary_BA and Maros admixture was part of the proto-Italo-Celtic population.

Simon_W said...

@ Davidski

Alright, thanks for the hint.

Simon_W said...

I also modeled myself:

[1] "distance%=1.7041"



I'm glad to see some ancient Egyptian admixture there! That's what I had expected, and what I was hoping... Interestingly, my 25.5% North Italian ancestry is very poor in Anatolia-related admixture. Apparently its distributiuon was still somewhwat patchy in 19th century Northern Italy. And apart from what's in the Hallstatt Celts, I only get some Maros, but no Hungary_BA. Maybe a somewhat different Italic wave than what got to Tuscany.

Drago said...

@ Simon.
Yep - stilll unfolding.
BTW there apparently were some ''elite'' burials during the Polada period: Romagnano Loc, La Vela di Valbusa, Aosta,... Hopefully we see some of these analysed.

Open Genomes said...


I0562 and I0563 appear twice in the Global25 sheet.


The Global25s are identical.

Which ones are they?

Is this a bug in the Reich data?

Open Genomes said...

@David, these are identical, Pazyrk_IA from Berel' Kazakhstan:

Supplementary Table 20:

Supplementary Table 20. Genomic capture samples.
Information on the six Scythian individuals for which genome-wide capture data was obtained and the number of SNPs overlapping the Human Origins array for the samples only shotgun sequencing was performed (IS2 and Ze6).
Harvard ID Mainz ID Site Culture/Label Date # SNPs Sex
Ι0562 Be9 Berel’, Kazakhstan Pazyryk_IA 4th-3rd c. BC 549958 F
I0563 Be11 Berel’, Kazakhstan Pazyryk_IA 4th–3rd c. BC 420749 M

Just keep these as Pazyrk_IA and it's fixed.

Open Genomes said...


JK2134, JK2911, Pre-Ptolemaic Egypt, 780-560 BCE;
JK2888, Ptolemaic Egypt, 97-2 BCE

JK2134 is not from the New Kingdom, but is an Egyptian Priest of Osiris from Abusir from the Late Period / start of the Persian Era. He is J1-Z2313 and very likely J-Y2919.

Can you please change JK2134 from Egypt_New_Kingdom to Egypt_Late_Period?

Davidski said...

@Open Genomes

I've now made those changes.

Open Genomes said...


Are the two individuals labeled Comb_Ceramic:Tamula1 and Comb_Ceramic_Estonia:Tamula1 duplicates?

Davidski said...

Yep, fixed.

Open Genomes said...


I5025/RISE567 (female, mtDNA U5b2c), labeled as a Czech Beaker, clusters very closely with modern and Medieval Slavs.

I5025/RISE567 clusters with Slavs on the Global25 Ward's distance-squared clustering tree

Is it possible that this isn't at all a Beaker skeleton, but a Medieval Czech?

Can you look at this and see what you get?

Davidski said...

This Czech Beaker looks Slavic in my PCA of Northern Europe too.

I'll mark this individual as an outlier, since there's no C14 date available and there's no way to be sure that this isn't just a reflection of substructure within the Czech Beaker population.

Open Genomes said...


RISE97 Sweden_LN 2025-1885 calBCE clusters with Beakers, and seems to be a very early example of the Central European Beaker folk, regardless of the culture.

RISE97 Sweden_LK clusters with Beakers on the Global25 Ward's distance-squared clustering tree

Could this be a clue about Beaker origins?

Open Genomes said...


vik_KAL006 Sweden_Viking_Age_Sigtuna is an outlier who appears to be an Estonian.

vik_KAL006 Sweden_Viking_Age_Sigtuna on the Global25 Ward's distance-squared clustering tree

Does vik_KAL006 come up as an Estonian for you?

Davidski said...

@Open Genomes

Does vik_KAL006 come up as an Estonian for you?

I don't know, I can't really check at the moment. But that Sigtuna sample set is supposed to be kind of diverse as far as Swedish ancestry is concerned, so it makes sense that at least one of the samples shows ancestry from the East Baltic.

Open Genomes said...


RISE431 CWC_Proto−Unetice_Poland
(Y R1a1a1, mtDNA T2e) clusters with Germanic speakers.

RISE431 clusters with Germanic speakers on the Global25 Ward's distance-squared clustering tree

Is RISE431 potentially from a later period, or does he represent population structure in the Late Corded Ware, a representative of the group that later became the pre-Proto-Germanic speakers of Scandinavia?

Notice this cluster includes I2566 Beaker_Britain and I6480 Beaker Czech. These also seem out of place, and either these are later burials or true outliers.

Open Genomes said...


If N47 is labeled as CWC_Poland_o then N49 CWC_Poland should be labeled CWC_Poland_o as well. Both cluster closely with I6579 Poland_EBA:

N49 CWC_Poland clusters closely with N47 CWC_Poland_o on the Global25 Ward's distance-squared clustering tree

Open Genomes said...

@David, if you don't mind and find it useful, I'll continue to identify outliers or "archaeologically out of place" samples. I don't use "populations" or "population averages" for this reason, but I do use somewhat corresponding "periodization" (i.e. Medieval, Iron Age, Bronze Age, Chalcolithic, Neolithic, Mesolithic, Upper Paleolithic) which is somewhat arbitrary. I try to make sure that Late Neolithic individuals with Steppe ancestry outside the core Steppe region are categorized as "Bronze Age" or "Chalcolithic" so that nMonte with a Neolithic period cutoff doesn't cause these samples to arbitrarily "attract" later and modern individuals with a Steppe ancestral component.

If you're going to use "population averages", like with some f3-stats to compensate for low coverage, it's best to use samples that cluster very closely together. I've seen what appears to be some overlap in the f3-stats matrix that doesn't appear with Global25.

Open Genomes said...


L-I0789 England_IA and I2609 England_CA_EBA both cluster with and Anglo-Saxon, Germans, and Dutch. L-I0789 is almost certainly an Angle, Saxon, or Frisian, and it seems likely that I2609 is also a North Sea Germanic tribesman as well, because we see no such ancestry among the people of the continent in the Chalcolithic or Early Bronze Age.

L-I0789 England_IA and I2609 England_CA_EBA cluster with an Anglo-Saxon, Germans, and Dutch on the Global25 Ward's distance-squared clustering tree