How Many Aliens Are in the Milky Way? Astronomers Turn to Statistics for Answers

The tenets of Thomas Bayes, an 18th-century statistician and minister, underpin the latest estimates of the prevalence of extraterrestrial life

In the 12th episode of Cosmos, which aired on December 14, 1980, the program’s co-creator and host Carl Sagan introduced television viewers to astronomer Frank Drake’s eponymous equation. Using it, he calculated the potential number of advanced civilizations in the Milky Way that could contact us using the extraterrestrial equivalent of our modern radio-communications technology. Sagan’s estimate ranged from “a pitiful few” to millions. “If civilizations do not always destroy themselves shortly after discovering radio astronomy, then the sky may be softly humming with messages from the stars,” Sagan intoned in his inimitable way.

Sagan was pessimistic about civilizations being able to survive their own technological “adolescence”—the transitional period when a culture’s development of, say, nuclear power, bioengineering or a myriad of other powerful capabilities could easily lead to self-annihilation. In essentially all other ways, he was an optimist about the prospects for pangalactic life and intelligence. But the scientific basis for his beliefs was shaky at best. Sagan and others suspected the emergence of life on clement worlds must be a cosmic inevitability, because geologic evidence suggested it arose shockingly quickly on Earth: in excess of four billion years ago, practically as soon as our planet had sufficiently cooled from its fiery formation. And if, just as on our world, life on other planets emerged quickly and evolved to become ever more complex over time, perhaps intelligence and technology, too, could be common throughout the universe.

In recent years, however, some skeptical astronomers have tried to put more empirical heft behind such pronouncements using a sophisticated form of analysis called Bayesian statistics. They have focused on two great unknowns: the odds of life arising on Earth-like planets from abiotic conditions—a process called abiogenesis—and, from there, the odds of intelligence emerging. Even with such estimates in hand, astronomers disagree about what they mean for life elsewhere in the cosmos. That lack of consensus is because even the best Bayesian analysis can only do so much when hard evidence for extraterrestrial life and intelligence is thin on the ground.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


The Drake equation, which the astronomer introduced in 1961, calculates the number of civilizations in our galaxy that can transmit—or receive—interstellar messages via radio waves. It relies on multiplying a number of factors, each of which quantifies some aspect of our knowledge about our galaxy, planets, life and intelligence. These factors include ƒp, the fraction of stars with extrasolar planets; ne, the number of habitable planets in an extrasolar system; ƒl, the fraction of habitable planets on which life emerges; and so on.

“At the time Drake wrote [the equation] down—or even 25 years ago—almost any of those factors could have been the ones that make life very rare,” says Ed Turner, an astrophysicist at Princeton University. Now we know that worlds around stars are the norm, and that those similar to Earth in the most basic terms of size, mass and insolation are common as well. In short, there appears to be no shortage of galactic real estate that life could occupy. Yet “one of the biggest uncertainties in the whole chain of factors is the probability that life would ever get started—that you would make that leap from chemistry to life, even given suitable conditions,” Turner says.

Ignoring this uncertainty can lead astronomers to make rather bold claims. For example, last month Tom Westby and Christopher Conselice, both at the University of Nottingham in England, made headlines when they calculated that there should be at least 36 intelligent civilizations in our galaxy capable of communicating with us. The estimate was based on an assumption that intelligent life emerges on other habitable Earth-like planets about 4.5 billion to 5.5 billion years after their formation.

“That's just a very specific and strong assumption,” says astronomer David Kipping of Columbia University. “I don't see any evidence that that's a safe bet to be making.”

Answering questions about the likelihood of abiogenesis and the emergence of intelligence is difficult because scientists just have a single piece of information: life on Earth. “We don't even really have one full data point,” Kipping says. “We don't know when life emerged, for instance, on the Earth. Even that is subject to uncertainty.”

Yet another problem with making assumptions based on what we locally observe is so-called selection bias. Imagine buying lottery tickets and hitting the jackpot on your 100th attempt. Reasonably, you might then assign a 1 percent probability to winning the lottery. This incorrect conclusion is, of course, a selection bias that arises if you poll only the winners and none of the failures (that is, the tens of millions of people who purchased tickets but never won the lottery). When it comes to calculating the odds of abiogenesis, “we don’t have access to the failures,” Kipping says. “So this is why we’re in a very challenging position when it comes to this problem.”

Enter Bayesian analysis. The technique uses Bayes’s theorem, named after Thomas Bayes, an 18th-century English statistician and minister. To calculate the odds of some event, such as abiogenesis, occurring, astronomers first come up with a likely probability distribution of it—a best guess, if you will. For example, one can assume that abiogenesis is as likely between 100 million to 200 million years after Earth formed as it is between 200 million to 300 million years after that time or any other 100-million-year-chunk of our planet’s history. Such assumptions are called Bayesian priors, and they are made explicit. Then the statisticians collect data or evidence. Finally, they combine the prior and the evidence to calculate what is called a posterior probability. In the case of abiogenesis, that probability would be the odds of the emergence of life on an Earth-like planet, given our prior assumptions and evidence. The posterior is not a single number but rather a probability distribution that quantifies any uncertainty. It may show, for instance, that abiogenesis becomes more or less likely with time rather than having a uniform probability distribution suggested by the prior.

In 2012 Turner and his colleague David Spiegel, then at the Institute for Advanced Study in Princeton, N.J., were the first to rigorously apply Bayesian analysis to abiogenesis. In their approach, life on an Earth-like planet around a sunlike star does not emerge until some minimum number of years, tmin, after that world’s formation. If life does not arise before some maximum time, tmax, then, as its star ages (and eventually dies), conditions on the planet become too hostile for abiogenesis to ever occur. Between tmin and tmax, Turner and Spiegel’s intent was to calculate the probability of abiogenesis.

The researchers worked with a few different prior distributions for this probability. They also assumed that intelligence took some fixed amount of time to appear after abiogenesis.

Given such assumptions, the geophysical and paleontological evidence of life’s genesis on Earth and what evolutionary theory says about the emergence of intelligent life, Turner and Spiegel were able to calculate different posterior probability distributions for abiogenesis. Although the evidence that life appeared early on Earth may indeed suggest abiogenesis is fairly easy, the posteriors did not place any lower bound on the probability. The calculation “doesn’t rule out very low probabilities, which is really sort of common sense with statistics of one,” Turner says. Despite life’s rapid emergence on Earth, abiogenesis could nonetheless be an extremely rare process.

Turner and Spiegel’s effort was the “first really serious Bayesian attack on this problem,” Kipping says. “I think what was appealing is that they broke this default, naive interpretation of the early emergence of life.”

Even so, Kipping thought the researchers’ work was not without its weaknesses, and he has now sought to correct it with a more elaborate Bayesian analysis of his own. For instance, Kipping questions the assumption that intelligence emerged at some fixed time after abiogenesis. This prior, he says, could be another instance of selection bias—a notion influenced by the evolutionary pathway by which our own intelligence emerged. “In the spirit of encoding all of your ignorance, why not just admit that you don’t know that number either?” Kipping says. “If you’re trying to infer how long it takes life to emerge, then why not just also do intelligence at the same time?”

That suggestion is exactly what Kipping attempted, estimating both the probability of abiogenesis and the emergence of intelligence. For a prior, he chose something called the Jeffreys prior, which was designed by another English statistician and astronomer, Harold Jeffreys. It is said to be maximally uninformative. Because the Jeffreys prior doesn’t bake in massive assumptions, it places more weigh on the evidence. Turner and Spiegel had also tried to find an uninformative prior. “If you want to know what the data is telling you and not what you thought about it previously, then you want an uninformative prior,” Turner says. In their 2012 analysis, the researchers employed three priors, one of which was the least informative, but they fell short of using Jeffreys prior, despite being aware of it.

In Kipping’s calculation, that prior focused attention on what he calls the “four corners” of the parameter space: life is common, and intelligence is common; life is common, and intelligence is rare; life is rare, and intelligence is common; and life is rare, and intelligence is rare. All four corners were equally likely before the Bayesian analysis began.

Turner agrees that using the Jeffreys prior is a significant advance. “It’s the best way that we have, really, to just ask what the data is trying to tell you,” he says.

Combining the Jeffreys prior with the sparse evidence of the emergence and intelligence of life on Earth, Kipping obtained a posterior probability distribution, which allowed him to calculate new odds for the four corners. He found, for instance, that the “life is common, and intelligence is rare” scenario is nine times more likely than both life and intelligence being rare. And even if intelligence is not rare, the life-is-common scenario has a minimum odds ratio of 9 to 1. Those odds are not the kind that one would bet the house on, Kipping says. “You could easily lose the bet.”

Still, that calculation is “a positive sign that life should be out there,” he says. “It is, at least, a suggestive hint that life is not a difficult process.”

Not all Bayesian statisticians would agree. Turner, for one, interprets the results differently. Yes, Kipping’s analysis suggests that life’s apparent early arrival on Earth favors a model in which abiogenesis is common, with a specific odds ratio of 9:1. But this calculation does not mean that model is nine times more likely to be true than the one that says abiogenesis is rare, Turner says, adding that Kipping’s interpretation is “a little bit overly optimistic.”

According to Turner, who applauds Kipping’s work, even the most sophisticated Bayesian analysis will still leave room for the rarity of both life and intelligence in the universe. “What we know about life on Earth doesn’t rule out those possibilities,” he says.

And it is not just Bayesian statisticians who may have a beef with Kipping’s interpretation.Anyone interested in questions about the origin of life would be skeptical about claimed answers, given that any such analysis is beholden to geologic, geophysical, paleontological, archaeological and biological evidence for life on Earth—none of which is unequivocal about the time lines for abiogenesis and the appearance of intelligence.

“We still struggle to define what we mean by a living system,” says Caleb Scharf, an astronomer and astrobiologist at Columbia. “It is a slippery beast, in terms of scientific definition. That’s problematic for making a statement [about] when abiogenesis happens—or even statements about the evolution of intelligence.”

If we did have rigorous definitions, problems persist. “We don’t know whether or not life started up, stopped, restarted. We also don’t know whether life can only be constructed one way or not,” Scharf says. When did Earth become hospitable to life? And when it did, were the first molecules of this “life” amino acids, RNAs or lipid membranes? And after life first came about, was it snuffed out by some cataclysmic event early in Earth’s history, only to restart in a potentially different manner? “There's an awful lot of uncertainty,” Scharf says.

All this sketchy evidence makes even Bayesian analysis difficult. But as a technique, it remains the best–suited method for handling more evidence—say, the discovery of signs of life existing on Mars in the past or within one of Jupiter’s ice-covered, ocean-bearing moons at the present.

“The moment we have another data point to play with, assuming that happens, [the Bayesian models] are the ways to best utilize that extra data. Suddenly, the uncertainties shrink dramatically,” Scharf says. “We don’t necessarily have to survey every star in our galaxy to figure out how likely it is for any given place to harbor life. One or two more data points, and suddenly, we know about, essentially, the universe in terms of its propensity for producing life or possibly intelligence. And that's rather powerful.”

Anil Ananthaswamy is author of The Edge of Physics (Houghton Mifflin Harcourt, 2010), The Man Who Wasn't There (Dutton, 2015), Through Two Doors at Once: The Elegant Experiment That Captures the Enigma of Our Quantum Reality (Dutton, 2018), and Why Machines Learn: The Elegant Math Behind AI (Dutton, 2024).

More by Anil Ananthaswamy
SA Space & Physics Vol 3 Issue 5This article was originally published with the title “How Many Aliens Are in the Milky Way? Astronomers Turn to Statistics for Answers” in SA Space & Physics Vol. 3 No. 5 ()
doi:10.1038/scientificamerican102020-7eYM2bpVUqs4vdcx9oHzOw