The Pitfalls of Evolutionary Genomics

Lead Image: Recent research analyzes mathematical models created to deduce conclusions about how evolution works at the level of populations of organisms.

A study examines the benefits and drawbacks of evolutionary genomics.

Claudius Ptolemy, an astronomer and mathematician from Alexandria in the second century, had a lofty goal. He wrote the Almagest, a magisterial treatise that attempts to explain the motion of stars and the motions of planets. Ptolemy devised a sophisticated mathematical universe model that seemed to replicate the motions of the celestial bodies he had been seeing.

Jeffrey Jensen is a researcher in the Biodesign Center for Mechanisms of Evolution at Arizona State University and a professor in the School of Life Sciences with the Center for Evolution & Medicine. Credit: The Biodesign Institute at Arizona State University

Unfortunately, his cosmic plan had a catastrophic weakness at its heart. Ptolemy began his study with the presumption that the Earth was the center of the cosmos, in keeping with the preconceptions of his day. The Ptolemaic universe, which was made up of intricate “epicycles” to explain the motions of the planets and stars, has long ago been consigned to history books, its conclusions persisted as scientific dogma for more than 1200 years.

No less vulnerable to flawed theoretical methods are the models in the area of evolutionary biology. Evolutionary biology can result in impressive models that fall short of capturing the genuine workings of nature as it develops the bewildering variety of living species on Earth.

A recent study looks at mathematical models created to deduce conclusions about how evolution works at the level of populations of species. The research comes to the conclusion that these models must be built with great caution, avoiding unwarranted starting assumptions, considering the quality of existing knowledge, and staying open to alternative explanations.

Failure to adhere to strict procedures in the construction of null models can result in theories that appear to fit some aspects of the data obtained from DNA sequencing but fall short in accurately elucidating the underlying evolutionary processes, which are frequently extremely complex and multifaceted.

Such theoretical frameworks may offer compelling but ultimately flawed pictures of how evolution actually acts on populations over time, be these populations of bacteria, shoals of fish, or human societies and their various migrations during prehistory.

In the new study, Jeffrey Jensen, a researcher in the Biodesign Center for Mechanisms of Evolution at Arizona State University and professor in the School of Life Sciences with the Center for Evolution & Medicine, leads a group of international luminaries in the field in providing guidance for future research. Together, they describe a range of criteria that can be used to better ensure the accuracy of models that produce statistical inferences in population genomics—a scientific discipline concerned with large-scale comparisons of DNA sequences within and across populations and species.

“One of our key messages is the importance of considering the contributions of evolutionary processes certain to be in constant operation (such as purifying selection and genetic drift), before simply relying on hypothesized or rare evolutionary processes as the primary drivers of observed population variation (such as positive selection)”, Jensen emphasized.

The study was recently published in the journal PLoS Biology.

A field comes of age

Population genomics arose as early efforts in the field attempted to reconcile Charles Darwin’s notion of evolution by means of natural selection with the first inklings of the mechanisms of inheritance, uncovered by the Augustinian monk, Gregor Mendel.

Susanne Pfeifer is a researcher in the Biodesign Center for Mechanisms of Evolution and an assistant professor at the Center for Evolution & Medicine. Credit: The Biodesign Institute at Arizona State University

The synthesis culminated in the 1920s and early 30s, largely thanks to the mathematical work of Fisher, Haldane, and Wright, who were the first to explore how natural selection together with other evolutionary forces would modify the genetic composition of Mendelian populations over time.

Today, studies in population genomics involve the large-scale application of various genomic technologies to explore the genetic composition of biological populations, and how various factors, including natural selection and genetic drift, produce changes in genetic composition over time.

To accomplish this, population geneticists develop mathematical models quantifying the contributions of these evolutionary processes in shaping gene frequencies, use this theory to design statistical inference approaches for estimating the forces producing observed patterns of genetic variation in actual populations, and test their conclusions against accumulated data.

The spice of life

The study of genomic variation focuses on DNA sequence differences among individuals and populations. Some of these variants are critically important for biological function, including mutations responsible for genetic disease, while others have no detectable biological effects.

Such variation in the human genome can take several forms. One common source of variation is known as single nucleotide polymorphisms, or SNPs, where a single DNA letter in the genome is altered. But larger-scale variation in the genome, involving the simultaneous alteration of hundreds or even thousands of base pairs is also possible. Again, some such alterations may play a role in disease risk and survival while many others have no effect.

Natural selection may occur when different variants segregating in a population have a fitness differential relative to one another. By designing and studying mathematical models governing the corresponding gene frequency change and applying those models to empirical data, population geneticists seek to understand the contributing evolutionary processes in a rigorous, quantitative way. Thus, population genetics is often regarded as the theoretical cornerstone of modern Darwinian evolution.

Adrift through the genome

Although the importance of natural selection to the evolutionary process is undeniable, the role of positive selection in increasing the frequency of beneficial variants — the potential driver of adaptation — is certain to be comparatively rare relative even to other forms of natural selection. For example, purifying selection — the removal of deleterious variants from the population — is a constantly acting and far more pervasive form of selection.

In addition, there are multiple non-selective evolutionary processes of great importance. For example, genetic drift describes the many stochastic fluctuations inherent to evolution. In large populations, natural selection may act more efficiently in purging deleterious variation and potentially fixing beneficial variation, whereas as populations become smaller genetic drift will be increasingly dominant.

The distinction can be seen in dramatic form when comparing prokaryotic organisms like bacteria with organisms composed of eukaryotic cells, including humans. In the former case, the vast population sizes tend to result in more efficient selection. In contrast, a weaker selection pressure operating in eukaryotes is more permissive to genomic changes, provided that they are not strongly deleterious.

According to the Neutral Theory of Molecular Evolution — a new guiding principle of evolutionary theory proposed by the population geneticist Motoo Kimura over 50 years ago — most evolutionary changes at the molecular level in real populations are governed not by natural selection, but by genetic drift. The study emphasizes that this critical point is too often missed by evolutionary biologists. As co-author Michael Lynch, director of ASU’s Biodesign Center for Mechanisms in Evolution cogently observes, “natural selection is just one of several evolutionary mechanisms, and the failure to realize this is probably the most significant impediment to a fruitful integration of evolutionary theory with molecular, cellular, and developmental biology.”

The new consensus study further stresses that a failure to consider these alternative evolutionary mechanisms which are certain to be operating, including genetic drift, and incorporate these into models of population genomics, is likely to lead researchers astray. The common overreliance on purely adaptive models to explain genomic variation has led to a raft of interpretations of dubious value, the authors assert.

The study presents a detailed flow chart that can help guide the development of more accurate models used to draw evolutionary inferences, based on genomic data. Biological parameters that vary among species include not only evolutionary variables like population size, mutation rates, recombination rates, and population structure and history but the way the genome itself is structured and life history traits, including mating behavior. All of these factors play a vital role in dictating observed molecular variation and evolution.

“While these many considerations may sound daunting for some researchers, it is important to note that many excellent research groups at ASU and around the world are actively improving our understanding of these underlying evolutionary parameters, providing constantly improving inference, for example, of mutation and recombination rates,” added co-author Susanne Pfeifer, an Assistant Professor in the Center for Evolution & Medicine and the Biodesign Center for Mechanisms of Evolution.

Where once, theoretical models in population genomics proliferated alongside relatively scant genomic data, today an avalanche of data, enabled by rapid, low-cost DNA sequencing of organisms across the tree of life, has dramatically changed the field. The careful and judicious use of this gold mine of genomic data will help advance the most rigorous models to unlock evolution’s many remaining mysteries.

Reference: “Recommendations for improving statistical inference in population genomics” by Parul Johri, Charles F. Aquadro, Mark Beaumont, Brian Charlesworth, Laurent Excoffier, Adam Eyre-Walker, Peter D. Keightley, Michael Lynch, Gil McVean, Bret A. Payseur, Susanne P. Pfeifer, Wolfgang Stephan and Jeffrey D. Jensen, 31 May 2022, PLoS Biology.
DOI: 10.1371/journal.pbio.3001669