Linguistics and genetics combine to propose a new hybrid theory regarding the origin of the Indo-European languages.
For over two centuries, the question of where the Indo-European languages originated has been a hotbed of contention. Two prevailing theories have recently taken center stage in this discussion: the ‘Steppe’ hypothesis, postulating that these languages originated around 6000 years ago in the Pontic-Caspian Steppe, and the ‘Anatolian’ or ‘farming’ hypothesis, which posits an earlier origin connected to early agriculture around 9000 years ago.
Previous phylogenetic investigations into the Indo-European languages have yielded contradictory results regarding the age of this language family. This discordance can be attributed to a mix of inaccuracies and inconsistencies in the datasets used in these studies, as well as constraints in the way phylogenetic methodologies have examined ancient languages.
To solve these problems, researchers from the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology assembled an international team of over 80 language specialists to construct a new dataset of core vocabulary from 161 Indo-European languages, including 52 ancient or historical languages. This more comprehensive and balanced sampling, combined with rigorous protocols for coding lexical data, rectified the problems in the datasets used by previous studies.
Indo-European estimated to be around 8100 years old
The team used recently developed ancestry-enabled Bayesian phylogenetic analysis to test whether ancient written languages, such as Classical Latin and Vedic Sanskrit, were the direct ancestors of modern Romance and Indic languages, respectively. Russell Gray, Head of the Department of Linguistic and Cultural Evolution and senior author of the study, emphasized the care they had taken to ensure that their inferences were robust.
“Our chronology is robust across a wide range of alternative phylogenetic models and sensitivity analyses,” he stated. These analyses estimate the Indo-European family to be approximately 8100 years old, with five main branches already split off by around 7000 years ago.
These results are not entirely consistent with either the Steppe or the farming hypotheses. The first author of the study, Paul Heggarty, observed that “Recent ancient DNA data suggest that the Anatolian branch of Indo-European did not emerge from the Steppe, but from further south, in or near the northern arc of the Fertile Crescent — as the earliest source of the Indo-European family. Our language family tree topology, and our lineage split dates, point to other early branches that may also have spread directly from there, not through the Steppe.”
New insights from genetics and linguistics
The authors of the study, therefore, proposed a new hybrid hypothesis for the origin of the Indo-European languages, with an ultimate homeland south of the Caucasus and a subsequent branch northwards onto the Steppe, as a secondary homeland for some branches of Indo-European entering Europe with the later Yamnaya and Corded Ware-associated expansions. “Ancient DNA and language phylogenetics thus combine to suggest that the resolution to the 200-year-old Indo-European enigma lies in a hybrid of the farming and Steppe hypotheses”, remarked Gray.
Wolfgang Haak, a Group Leader in the Department of Archaeogenetics at the Max Planck Institute for Evolutionary Anthropology, summarizes the implications of the new study by stating, “Aside from a refined time estimate for the overall language tree, the tree topology and branching order are most critical for the alignment with key archaeological events and shifting ancestry patterns seen in the ancient human genome data. This is a huge step forward from the mutually exclusive, previous scenarios, towards a more plausible model that integrates archaeological, anthropological, and genetic findings.”
Reference: “Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages” by Paul Heggarty, Cormac Anderson, Matthew Scarborough, Benedict King, Remco Bouckaert, Lechosław Jocz, Martin Joachim Kümmel, Thomas Jügel, Britta Irslinger, Roland Pooth, Henrik Liljegren, Richard F. Strand, Geoffrey Haig, Martin Macák, Ronald I. Kim, Erik Anonby, Tijmen Pronk, Oleg Belyaev, Tonya Kim Dewey-Findell, Matthew Boutilier, Cassandra Freiberg, Robert Tegethoff, Matilde Serangeli, Nikos Liosis, Krzysztof Stroński, Kim Schulte, Ganesh Kumar Gupta, Wolfgang Haak, Johannes Krause, Quentin D. Atkinson, Simon J. Greenhill, Denise Kühnert and Russell D. Gray, 28 July 2023, Science.