The New York Times has reported that the Internal Revenue Service gave one of its most rigorous types of audits to James B. Comey, the former F.B.I. director, and to Andrew G. McCabe, his former deputy.
This has prompted a lot of perfectly reasonable questions, most of them variants of: What are the odds? As the article noted, the chances that two high-ranking political enemies of President Donald J. Trump were audited by pure coincidence are minuscule.
But minuscule is not zero.
If we wanted to believe this was a coincidence, how improbable would we say it was? Here, we try to estimate that probability as seriously as we can.
First, the facts: Both men were chosen for audits under the National Research Program (N.R.P.), a tiny subset of all the audits the I.R.S. performs each year. These audits scrutinize a sample of returns to gather data on tax compliance.
According to the I.R.S., there were about 5,000 such audits in 2017, 4,000 in 2018, and 8,000 in 2019 — chosen from about 154 million individual tax returns each year. Mr. Comey’s audit was for his 2017 tax return; Mr. McCabe’s was for his 2019 return.
Many aspects of the N.R.P. complicate our calculations, including the sampling methodology of I.R.S. auditors and the different years of the audits themselves. We will return to these issues later. For now, we’ll assume all taxpayers have an equal chance of being audited and that both men were audited in 2017.
If this problem were to appear in a textbook about probability, it might read like this:
If there are 154 million marbles (the approximate number of tax returns filed each year) in a giant urn, and some small number of them are red (those representing Mr. Comey and Mr. McCabe among them), what are the chances that you will draw two or more red marbles if you randomly draw a few thousand from the urn (the number of audits in that year)?
It may sound complicated, but it’s a relatively well-studied problem, something many math or stats majors would encounter in their college coursework. People have already derived equations to estimate these probabilities, with names like the hypergeometric distribution, which has applications like election auditing and card counting.
We can simply enter our estimates for the number of total marbles, the number of red marbles and the number of draws, and we’ll get a probability. If we believe there are just two red marbles — that is, if we limit the exercise to only Mr. McCabe and Mr. Comey — this equation yields a probability of roughly one in 950 million.
Those are considerably steeper odds than your chances of winning the Powerball. It’s also an almost meaningless result. At best, it’s the right answer to the wrong question.
To understand why requires acknowledging an absurdity inherent in our exercise: To best estimate the likelihood of an improbable event, we must set aside the fact that we know that it already happened. (The probability that it happened is 100 percent.)
Jordan Ellenberg, a professor at the University of Wisconsin who has written books about math and reasoning, described it this way: “In some counterfactual universe, what is the probability that this thing, which has already happened in our universe, happens?”
It might seem odd, but the same issues come up even in probabilistic exercises as basic as flipping a coin.
If you flipped a coin 20 times, your specific sequence of heads and tails is extraordinarily rare, about one in a million, but it did happen. And some sequence of flips will always happen. It’s a surprising coincidence only if that’s the sequence you set out to get before flipping.
In the same way, it’s incorrect to narrow our search only to Mr. Comey and Mr. McCabe, because it’s likely we’d be examining these probabilities if we learned that two other notable political enemies of an administration were audited instead of these two men.
A better question is: What is the likelihood that two or more people like Mr. Comey and Mr. McCabe would be audited over this period?
Should this group of people include any two top F.B.I. officials? Any two top Department of Justice officials? It’s this framing — a subjective decision rather than a factual one — that most drives any probability estimate, more than any choice of statistical distribution or sampling weights.
Here is a chart of the probability our equation yields at different choices for the number of red marbles, ranging from two (Mr. Comey and Mr. McCabe and no one else) to 400 (a conservative estimate of the number of Americans Mr. Trump insulted by name on Twitter since beginning his run for the presidency).
The probability increases drastically with the choice of who should be considered a red marble alongside Mr. Comey and Mr. McCabe.
The point is not to decide on a number but to recognize that our choice of group size is what drives our answer. Although some guesses are certainly better than others, many choices are defensible.
Addressing the details
Now let’s try to narrow down something a touch more realistic, and return to some of the things we ignored in our simple interpretation of this problem.
First, the two men were not audited for the same year. By widening our scope to cover the three-year span from 2017 to 2019, our resulting probabilities increase significantly. This is straightforward: If a person has a certain chance of being audited in a given year, more years means more opportunities to be audited.
Second, we are interested only in the probability that at least two people are chosen. We will not consider the probability that the same person is picked twice; it seems unlikely given that the audits can stretch out over a year, according to Mr. Comey’s account. Note that we are looking at the probability of at least two people being selected, not precisely two, since it would also be significant if three or more individuals from a group were chosen.
Finally, the I.R.S. does not select people in truly random fashion. Instead, the agency tends to select some kinds of taxpayers, including high earners, more often than others. For the 2001 tax year, the N.R.P. sample included returns from people around the 90th percentile of income at about 1.7 times the rate one would expect were returns chosen independently from earnings. That rate spiked through the highest income ranks, so that people with income in the top 0.5 percent were more than 10 times as likely to be in the sample as someone closer to the median income.
We can probably assume that any group of Mr. Trump’s enemies would earn more than a random sample of Americans. But we cannot realistically estimate the complete incomes of everyone in our group in every year. We also know that the I.R.S. has considered other factors in its sampling, such as the type of returns that taxpayers file, and that sampling methods can change year to year. This leaves us with little in the way of guidance for how to match the I.R.S.’s methods. As such, we will leave our estimates unweighted by income. As a back-of-the-envelope exercise, if you are worried about how income affects these results, you can double the resulting probability if you think the members of a group have very high earnings, and multiply it by 10 if you think they’re extraordinarily rich.
Putting them all together
Incorporating those choices, the table below provides some estimated probabilities depending on the group size being considered.
Alternatively, if our choices are not satisfactory, we created a simple calculator for you to make your own probabilities:
So which estimate is “correct”?
Most realistic outputs of this equation could accurately be described as “very rare” or even “extraordinarily rare,” yet none is proof of wrongdoing.
“It’s a little like the irresistible force and the immovable object,” said Andrew Gelman, a professor of statistics and political science at Columbia University, when told in the abstract about this exercise. “On the one hand, you’re saying it’s completely random. On the other hand, you suspect it’s not.”
Mr. Gelman, like every other statistician who spoke with The Times about this problem, said the biggest hurdle was not any of the details but defining the question itself.
When we try to calculate the probability of a given event because we suspect it may not be random, we end up in the complicated position of trying to imagine how we would have predicted the likelihood of the event before it happened, said David Spiegelhalter. He heads the Winton Centre for Risk and Evidence Communication at the University of Cambridge, an organization dedicated to improving the way quantitative evidence is used in society.
The math is easy, he said, but formulating the question is tricky, bordering on “meaningless,” in large part because of how hard it is to pin down the group we care about.
“‘What’s the chance of this happening?’ is an easy statement to make,” he said. “It’s a familiar statement to make. But, actually, it’s a very difficult question to answer.”
Math has its limits. The point of trying to estimate a probability such as this one, Mr. Gelman said, is not to put too much stock in the numbers, but to let the result push you to find out more.
In this case, the best question is not one with an answer you can look up in a statistics textbook.
Instead, Mr. Gelman said, the question to pose is: “What’s going on?”
Matthew Cullen contributed reporting.