Wednesday, December 7, 2016

Population history of Sardinia from 3,514 whole genomes

Just in at bioRxiv:

Abstract: The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of traits and diseases. The history of the Sardinian population has also been the focus of much research, and in recent ancient DNA (aDNA) studies, Sardinia has provided unique insight into the peopling of Europe and the spread of agriculture. In this study, we analyze whole-genome sequences of 3,514 Sardinians to address hypotheses regarding the founding of Sardinia and its relation to the peopling of Europe, including examining fine-scale substructure, population size history, and signals of admixture. We find the population of the mountainous Gennargentu region shows elevated genetic isolation with higher levels of ancestry associated with mainland Neolithic farmers and depleted ancestry associated with more recent Bronze Age Steppe migrations on the mainland. Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations. Further, allele sharing with pre-Neolithic and Neolithic mainland populations is larger on the X chromosome compared to the autosome, providing evidence for a sex-biased demographic history in Sardinia. These results give new insight to the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

Charleston et al., Population history of the Sardinian people inferred from whole-genome sequencing, bioRxiv, Posted December 7, 2016, doi:


Grey

ancient interior ancestry preserved by coastal malaria swamps

Romulus

If the steppe people did bring Indo European it would be the sardinians speaking Basque.

jeanlohizun

Interesting quotes from the study:

Abstract: "Notably, the Gennargentu region also has elevated levels of pre-Neolithic hunter-gatherer ancestry and increased affinity to Basque populations."

In the actual paper:

"[...]To correct for this when measuring similarity to other mainland populations, we used “shared drift” outgroup-f3 statistics(Raghavan et al. 2014) which are robust to population-specific drift. Using this metric, we find the Basque are the most similar to Sardinia, even more so than neighboring mainland Italian populations such as Tuscany and Bergamo(Figure S6A, S6B). This relationship is corroborated by identity-by-descent (“IBD”) tract length sharing, where among mainland European populations, French Basque showed the highest median length of shared segments (1.525 cM) with Arzana(Figure S7). We also tested the affinity between Sardinians and Basque with the D-statistics of the form D(Outgroup,Sardinia; Bergamo or Tuscan,Basque). In this formulation, significant allele sharing between Sardinia and Basque, relative to sharing between Sardinia and Italian populations, will result in positive values for the D-statistic. We find that Sardinia consistently showed increased sharing with the Basque populations compared to mainland Italians (|Z|> 4; Figure S6C), and the result was stronger when using the Arzana than Cagliari sample (DARZ=0.008 and 0.0096, DCAG= 0.0072 and 0.0087 for French Basque and Spanish Basque, respectively). In contrast, sharing with other Spanish samples in our dataset was generally weaker and not significant ( |Z| <3.5; Figure S6C), suggesting the shared drift with the Basque is not mediated through Spanish ancestry. "

This relationship is not mediated through the Spanish.

Dude ManBro

@ Romulus

Sardinians spoke Nuragic until Roman occupation, so while it was not Euskara, the language of the Basque people, they did speak a non-IE language until historical times.

Annie Mouse

By Hunter Gatherer they mean Lorschbour. El Miron would have been more interesting and Villabruna.

By farmer they mean LBK-EN

They also looked at Yamnaya are representative of the Bronze age.

As the Sardinians are quintessential southern europeans is not all that surprising it was low in Yamnaya and Lorschbour.

As it turns out they have a juicy Sardininan meso/neo (7k BCE) full skeleton that they unearthed back in 2011. I'd like to see the DNA from that.

Richard Rocca

Years before the first ancient Y-DNA samples came out, I had proposed a link between I2a1-M26 in Sardinians and Basques as a related pre-R1b population that spoke a Basque-like language. The linguistic link was confirmed in the 2012 book titled "Iberia e Sardegna - Legami linguistici, archeologici e genetici dal Mesolitico all'Eta del Bronzo" by Eduardo Blasco Ferrer. In it, he reconstructs Paloe-Sardinian and it's shared parent language with Basque, which he calls "Pre-Proto-Basque". Now this paper correlates the more archaic areas of Sardinia, which is where I2a1-M26 is highest, with increased affinity to Basques and ancient WHG and EEF ancestry. It is not difficult to project that the areas with higher steppe ancestry is where U152 is higher (in the North and West). In peninsular Italy, only 4 of 800 males (not a typo) were I2a1-M26, but obviously U152 is very high in Northern Italy and Tuscany.

Roy King

As the authors mention, the CEPH samples are from near Gennargentu. They have 10/16=62.5% haplogroup I-M26 and only 1/16 M269. That really suggest two things:
1) that I-M26 may carry the Basque-PaleoSardinian language and may be a relic hunter-forager language.
2) M269 might reflect Steppe input to the Southern Mediterranean.

Roy King

I should add that the other R1b sample in CEPH/Gennargentu is V88 and could track the Mediterranean Cardial migration like at Le Trocs in Spain.

Grey

IIRC (from an earlier paper) the mountain interior has a lot of ydna I.

If the mountain interior was mixed hg/farmer then you wonder which the ydna I was from?

AWood

Does anyone know (or recall) the HG breakdown of the mountain interior?

Samuel Andrews

@Annie Mouse,
"As the Sardinians are quintessential southern europeans is not all that surprising it was low in Yamnaya and Lorschbour."

Sardinians aren't quintessential Southern Europeans. If anything Southern Italians are. Sardinians were isolated from Steppe and Middle Eastern migration which makes up the majority of all Southern European's ancestry, except Basque. EEF and ancestry related to EEF is higher in Southern Europe than Northern Europe, but it isn't their defining feature.What really distinguishes Southern Europeans the most from Northern Europe is less Steppe admixture and recent Middle Eastern ancestry, not Neolithic European EEF. Both of these features peak in Southern Italy.

Matt

Not on the main topic of Sardinians per se, with their ADMIXTURE an interesting splits at high K after Basque component emerges as distinct from modelled as Lithuanian+Sardinian combination with Orcadian, GBR and Norway collecting Basque component (around 1/4) while it being more absent in Czech and Hungarian who tend to pick up Sardinian or Cypriot specific components instead:

capra internetensis

Re I2-M26,tl;dr I wonder if it did not actually arrive in the Chalcolithic.

In Francalacci's massive sample (n=1200) 39% of Sardinian men had I2-M26. Almost all of that was I2-L160, which is according to Y-Full around 5700 (4800-6600) years old; this is most common in Sardinia but also found elsewhere. But in fact three quarters of the I2-M26 is in the PF4188 subclade, and half of it is PF4295 under that, which seem specific to Sardinia. There is no TMRCA estimate from Y-Full for these clades due to low coverage, but the upstream clade PF4189 is estimated to be 5400 (4300-6500) years old, and each of the downstream clades should have several centuries to its name. Which (to finally reach the point) is consistent with an expansion of I2-M26 in Sardinia anywhere from the 4th to 2nd M BC, but perhaps most likely in the early-mid 3rd M.

After in-depth research on Sardinian prehistory (ok, glancing at Wikipedia) I see that culture with copper daggers, statue-menhirs, and Rinaldone-like pottery arrives at this time, a few centuries before Bell Beaker. We have 3 Remedello samples from different times which are all I2-Y3992(xL160) under M26. Then there is Bell Beaker itself, which could also have had I2a1a.

Rob

@ Capra
Yes I'd tend to agree
Which BB was I2a ?

Shaikorth


It could be something Novembre warned about with:

" For example, if applied to a geographic continuum, the method will infer source populations that are vaguely spatial but have no real interpretation as source populations in an admixed sample"

This paper includes an ADMIXTURE run where a South Italian/Cypriot component forms and replaces Sardinian everywhere in Europe beyond Iberia and Italy:

FrankN

Unfortunately, the authors have ignored that during the MN, Sardinia was the hub of an Obsidian trade network that reached Tuscany, the lower Rhone and Catalonia (salt mines there possibly providing the counter-merchandise). Obsidian-rich Monte Arci, SW of Oristani, provided most of the exported material. The establishment of that network most likely reflects Aegean or Levantine influences, where similar networks already existed since the EN. Possible impacts in terms of MN immigration from the Eastern Mediterranean, as well as genetic connection to Remedello and Iberia MN instead of just to Stuttgart, would definitely have deserved exploration.

Moreover, recent archeometallurgical result from Italy indicates a much later commencement of copper production in N. Italy than originally anticipated. Possibly for good reason - they had a well-going export business of high-quality processed stone (Jadeite axes to Brittany, Remedello daggers to Bavaria, Greenstone ear-/ armrings from Trentino to Slovenia etc.) - why bother with the cheap metal imitate when you have the "real stuff"? [Note in this context that the Iceman's copper axe came from the Salzburg area, and he was killed close to a major greenstone deposit].
As it looks now, copper metallurgy initially by-passed N. Italy and entered via Liguria, Sardinia and Sicily. From the CA on, Sardinia became a major mining location, not only for copper, but also for silver, as a/o indicated by the name Gennargentu ("silver carrier").

The early CA sees two cultures co-existing on the Island: The Arzachena culture in the NE shows parallels to Catalonia and the Provence, while the Ozieri culture has strong links to the Aegean and especially Crete. It is the latter that is generally credited with establishing silver and copper mining, however, chronology makes entrance from Liguria more likely. In any case, we have to anticipate another wave of East Med immigration. If the silver/copper people circumwent the Alps, they should most likely have originated on the Balkans (MN/CA (post-)Vinca culture spread into Austria is archeologically indicated), and could have enhanced an "EEF plus I2a1" genetic profile (c.f. I2a1 in Starcevo).

The Monte Claro Culture, appearing by the mid 3rd mBC, marks a clear break to (Epi-)Oziero and was most certainly invasive. It is described as a bridge between Piano Conte (Sicily) and Fontbuisse (Languedoc) cultures. It concentrates in the (EEF-dominated) SW, leaving little traces in the NW.

During the MBA, Sardinia was a major copper exporter; archeometallurgical analyses have shown it as important supplier to Sweden, alongside NW Iberia and Tirol. This implies regular contact along the transport chain, either towards the passes across the Western Alps, or the Lower Rhone, or Baleares/ Iberia. This is another issue the study doesn't explore.

@ Capra: yFull TMRCA estimates for I2a1 are once more „engineered“. Calculated PF4189 age is 5.7 ka, but lowered to 5.4 ka in order to bring it in line with upstream Z105. Conversely, calculated 5,5 ka for L160 have been raised to 5.7 ka in order to account for a higher age of downstream Y3991. Then again, the L160* age calculation included under upstream Z2049 yields 8.1 ka. To me, for Sardinian L160 that points to:
a) Neolithic arrival, probably picked up on the Balkans;
b) Star-like expansion by the mid 4th mBC, coinciding with the uptake of silver and copper mining/ processing,
c) Withdrawal towards the Gennargentu after the Monte Claro invasion.

Matt

@ Shaikorth, yes, totally, I think we can be wary of taking the ADMIXTURE literally and modeling the NW European groups as mixtures of a Basque+Baltic population while at the same time modeling the East Central European groups with the Baltic population plus lower levels of East Mediterranean groups.

My thinking was more that it seemed pretty cool that in a dedicated West Eurasia focused panel like this, the (geographically plausible) very slight variations in allele similarity and relatedness to the East Mediterranean and Italy vs Iberia are captured within the components. Quite fine structure. It often seems to be the case that in globalised runs that doesn't happen, or intra European runs the sample panel is too restricted for that to show up. (Likewise also pretty neat that it is able to bin the Norwegian, Orcadian, GBR and most of the French ancestry's into a single geographically interpretable cluster at its very highest K!).

At lowest K3 - - also seems they found a Basque+NW European+Baltic+Finnish cluster and then the Hungarian and Czech samples showed low level admixture (very low for Czech, more noticeable for Hungary) between this and a generalised Caucasus modal cluster.

Shaikorth


Finestructure and PCA often form some kind of Northwest European cluster as do STRUCTURE-based algorithm's (DNAland's correlates with Rathlin chunk sharing pretty well).

Basque-Sardinian vs East Med (Druze peak) type sharing repeats here:

However I wonder if that's just a Caucasus/East Med issue caused by their own European-like ancestry and drift. When that South Italian/Cypriot based Mediterranean component formed, it replaced Sardinian in NW Europe as well.

This is another run where Basque and Sardinian are both widespread in Europe, looks like it depends heavily on sampling and K.

Gioiello

@ Roy King

"I should add that the other R1b sample in CEPH/Gennargentu is V88 and could track the Mediterranean Cardial migration like at Le Trocs in Spain".

Ridiculous. All the oldest R-V88 (R-V88*, R-V88-M18, R-V88-M35) are in Sardinia, Italy and Western Europe. No R1b in Middle Eastern aDNA.

Gioiello

R-V88-V35 of course.

xibler


I agree with most of what you said, but the point about L160...

a) Neolithic arrival, probably picked up on the Balkans;

My question is why the Bulkans? If I were a betting man I'd put my money on lower Rhone.

Wasn't Bichon CTS595, upstream grandady of them all?
He was sort of in the neighborhood.

capra internetensis

The African admixture is interesting. Minor but detectable (~0-4%) in the southern and western provinces but not in the more isolated and mountainous north/central east of the island. The central admixture dates for most populations are ~1700-2100 years ago, so basically classical Roman era, again bringing to mind the statement of al Idrisi that the Sardinians were originally barbarized African Romans. However, the admixture date for Oristano is more like 3000-3500 years ago, prior even to Carthaginian settlement. I suppose ALDER is probably conflating a number of different events. The best reference populations were not North African but Sub-Saharan, which suggests that major genetic change has occurred in North Africa since the period in question.


Of course the TMRCA error bars can comfortably encompass the 4th M BC if that is more plausible. I was looking at the Italian Copper Age link because of the relatively close paternal relationship with Remedello.

I2a1 is not very informative. I2a1 and even I2a1a are very old and widespread, dating back to Bichon, as xibler says. The Hungarian I2a1s which have been analyzed in more detail(KO1 and NE7) belong to the more common I2a1a2. I1303 from Chalcolithic Iberia has I2a1a1, but is quite late. Remedello has a common ancestor with Sardinians at the I2a1a1a-Y3992 level, which is much younger.


I don't know of any Bell Beaker I2s, I was just considering the possibility. Sounds like we won't have to wait long to find out more. Still, with every mystery solved two more arise, it seems.

Roy King

I never said that V88 originated with Middle Eastern Neolithic PPNB farmers. Methinks your paranoia got the best of you! I said that V88 may track the cardial migrations as opposed to the LBK migrations and its presence in Sardinia may conform to the Neolithic autosomal correspondences with perhaps Le Trocs and that I-M26 might reflect European HG in Sardinia. In fact, I-M26 may have migrated to Central Sardinia later with the appearance of megaliths in the area from France or Spain.

Gioiello

@ Roy King

"I never said that V88 originated with Middle Eastern Neolithic PPNB farmers. Methinks your paranoia got the best of you! I said that V88 may track the cardial migrations as opposed to the LBK migrations and its presence in Sardinia may conform to the Neolithic autosomal correspondences with perhaps Le Trocs and that I-M26 might reflect European HG in Sardinia. In fact, I-M26 may have migrated to Central Sardinia later with the appearance of megaliths in the area from France or Spain".
You didn't say that, but saying that it came with Cardials does presuppose that it came from Middle East, because:
1) the Levantinists' theory is that
2) FTDNA, which is the ideological and economical interface of that theory, always supported that, from the infamous migration tree of hg. R1b of Vincent Vizachero, that only recently disappeared from that site in favour of the most scientific tree of Sergey Malyshev (smal), and all people (linked to you and the Levantinist ideology) try to support that
3) "I said that V88 may track the cardial migrations as opposed to the LBK migrations": in the Levantinist ideology these both migrations started from Middle East, or Anatolia, but not saying that Anatolia (above Northern one) was linked to Europe genetically and had nothing to do with Natufians and Iranians, except that they had had a migrations from the Villabruna: see what I said to your compatriot Sam about mt H32 found in natufian and iranian aDNA (its origin is in Western Europe, very likely in Italy).
4) I'd like that you were able to call me "paranoic" face to face, or that some of your ancestors were able to have said that to mine.

AWood

^You should get a permanent ban for this post.

Matt

@ Shaikorth: "This is another run where Basque and Sardinian are both widespread in Europe, looks like it depends heavily on sampling and K."

It seems to me like it may depend on how "specific" to Basque or Sardinian the components which form are. In the runs where they are replaced in all other Europeans by other components they're probably very specific, in the ones where they are widespread, they are quite generalised. This run must have the particular balance of specificity, not "too much" or "too little", to place the Basque where it does (mostly in the West European populations, and not others).

Also, what's your opinion on the formation of a Chuvash component which contributes at around 33-40% in some Turkish samples? Does that reflect a real transfer do you think, or mostly independent Turkic contributions?

FrankN

@xibler: "why the Bulkans?"
a) The current distribution of I2a1 suggests LGM refugium around the Northern Balkans (N. Adriatic plain, flooded some 8kya?);
b) They lie upstream of Sardinia (Sicily) along the maritime Neolithic (Cardial) expansion route. Cardial EEF collecting I2a1 there could have provided critical mass for neolithic expansion on Sardinia.

I wasn't aware of Bichon being CTS595. AncestralJourneys lists him as I2 only. But in any case, I2a1(a) should have been all over the place by the late Mesolithic (note, e.g. Motala[9] I2a1a1a*), so I would in no way exclude entrance from S. France. We need (more) aDNA from Sardinia, S.France, Italy and the Balkans to sort out specific migration paths and times.

@Capra – re TMRCA calculation: Have you seen the new Reich/ Mathieson paper pointing at regional differences in mutation rates (enhancement is mine):
We find at least two distinct signatures of variation. One, consistent with a previously reported signature is characterized by an increased rate of TCC>TTC mutations in people from Western Eurasia and South Asia, likely related to differences in the rate, or efficiency of repair, of damage due to deamination of methylated guanine. We describe the geographic extent of this signature and show that it is detectable in the genomes of ancient [Loschbaur, Stuttgart], but not archaic [UI, Denisova] humans. The second signature is private to certain Native American populations, and is (..) a result of the fact that highly mutable CpG sites are more likely to undergo multiple independent mutations across human populations, and the spectrum of such mutations is highly sensitive to recent demography. Both of these effects dramatically affect the spectrum of rare variants across human populations, and should be taken into account when using mutational clocks to make inference about demography.
Note also p.7 on differences between African and non-African mutation rates, the latter being in average some 5% higher.

Shaikorth

Maybe, although a more general South European component hasn't formed so we don't know if it would take Basque's (and Sardinia/Levant's) place.

About the Turks and Chuvash, looks like that's just shared Turkic ancestry. GLOBETROTTER and Broushaki 2016 TVD suggests Turkmens, Uzbeks and Nogays are better representatives of the direct source.

Chris Davies

Re: Basque/Sardinian genetic affinity.

The HLA haplotype A30-Cw5-B18-DR3-DQ2 is instructive. This is the #1 HLA haplotype in Sardinia. It appears to have formed in northern Africa, migrated to Sardinia early on, reached high frequency in Sardinia due to founder effect / drift, and then entered mainland Europe with second highest European frequency in Basques. The full haplotype, or variations on it, can be found in Maghrebi Berbers, Senegalese Mandenka, Ghanaians, Chadic speakers in north Cameroon, Sudanese, and Kenyan Luo [although Africa is badly under-sampled].

Italy Sardinia Pop.3 - 12.50% [highest-frequency class I HLA haplotype in Sardinia]
Spain Gipuzkoa Basque - 6.10% [second-highest frequency class I HLA haplotype in Basques]
Portugal South - 4.10%
Spain Murcia - 3.50%
Spain Majorca & Minorca - 2.20%
France Corsica Island - 2.00%
Spain Catalonia Girona - 1.70%
Spanish expats/migrants in Germany - 1.57%
Portugal Beja - 1.50%
Italy Pop. 5 - 1.26%
Portugal Faro - 1.20%
Portugal North - 1.10%
Italian expats/migrants in Germany - 0.95%
Portuguese expats/migrants in Germany - 0.81%
Romanian expats/migrants in Germany - 0.41%
Albanian Pop.2 - 0.12%
Greek expats/migrants in Germany - 0.11%
Russia Moscow Pop.2 - 0.10%
Germany Pop.7 - 0.10%
Poland DKMS - 0.09%

xibler


Thank you. I'm not 100% sure about Bichon CTS595, but I'm pretty sure I read it on here somewhere... truthiness.

As for the BAlkan refuge. Maybe so, but it seems hard to draw a line (where do we?).

Those guys must've been quite familiar with the length and breadth of the Po/Adriatic basin from one end to the other. I'm not so sure that the gulf of Genoa, and farther afield the Rhone Basin and gulf of Lion would have been so walled off by the Maritime Alps. And there's some kind of evidence that somebody was messing around in Sardinia long before EEF arrived. They knew where the goods, obsidian, was.

And besides, it could stand to reason that both sides of Italian Peninsula were familiar with each other as there's sizeable "Villabruna" component even in far off El Miron, back near the LGM.

Just sayin'

capra internetensis


Thanks, yeah, I saw it. Already Scozzari et al in 2014 found a slower rate of mutation in African Y haplogroups based on different branch lengths.

Bichon having I2a1a2 is from Genetiker. You can see his Y SNP calls here:

The genome is quite good quality so the result of pre-CTS595 is solid.

FrankN

@capra: "Bichon having I2a1a2 is from Genetiker."

Thx. I looked at his calls. Positive for L1286/Y4213 (which would be I2a1a3 acc. to current nomenclature, formed 14,9 kya acc. to yFull), negative for downstream L1287.
So, he isn't a direct ancestor to the Sardinian I2a1a1a1(L160), but nevertheless demonstrates early spread of I2a1a (CTS595).

Dom

My earliest known patrilineal ancestor is from the 1600s in the south west of France in the vallée d’Ossau of the Pyreneans mountains in Sévignacq-Meyracq in Béarn.
I tested positive to I-Z106 same as I-Z98 5 levels down from I-L160. Based on FTDNA results, Z106 has been found in several Basque people from Viscaya, and also in Dorset in South West England.
Z106 is not found in Francalacci Sardinian studies. I-Z99 is the common ancestor with Sardinia with a TMRCA of 3500.
I don’t know what to make of this.