Judging the Prognosticators

Published in the Globe and Mail (Toronto), November 23, 2016

Dan Gardner and Philip E. Tetlock

It seems almost the entire pollster-and-pundit class got the U.S. presidential election wrong, leaving millions of people feeling shocked, even betrayed. Worse, they feel afraid. There’s so much uncertainty about a Trump administration and people are desperate for insight about what’s coming next, but those who should be able to tell them just failed miserably.

Who can we turn to for forecasts we can trust? That’s not easily answered. Judging even a single forecast is harder than it seems. And judging forecasters is exponentially harder.

Luck is the fundamental problem. The person who picks six lottery ticket numbers and wins is lucky, not skilled, but all too often we don’t even consider luck when someone predicts something like a presidential election. And calling a presidential election takes a lot less luck than winning a lottery. A flipped coin had a 50 per cent shot at picking Donald Trump.

So we can’t simply look at the result and assume that someone who said “Trump will win” showed real foresight.Unfortunately, many people are doing just that.

One poll has been widely praised for having predicted a Trump win – even though it did that on the basis of a grossly inflated estimate of Mr. Trump’s share of the popular vote. And then there’s Michael Moore.

In July, Mr. Moore wrote a short essay in which he argued that a Trump victory was certain. Read today, his reasons look sound. After the election, Mr. Moore was celebrated and the media sought him out to explain what would happen next. Left unmentioned is that after Mr. Moore wrote that piece in July, he wrote another, in August, during a time when Mr. Trump’s campaign was in chaos. Mr. Moore insisted Mr. Trump was “self-sabotaging” so he could quit before the election was held.

The media also ignored that on Oct. 9, when the Trump campaign was again in disarray and Hillary Clinton was again surging in the polls, Mr. Moore tweeted this: “Some note it was my post in July, the 5 reasons Trump would win, that lit a fire under millions 2 take this seriously & get busy. Ur welcome.” No matter what the outcome, Mr. Moore had evidence he had seen it coming.

In any event, what most people care about isn’t whether Michael Moore should take a victory lap. They want to know who can give them reliable insight into what is likely to happen next. So for the sake of argument, let’s say that Mr. Moore nailed the election. He is now predicting that Mr. Trump will resign or be impeached before the end of his term. Based on his success, can we take that to the bank? Judging by the way the media treated Mr. Moore’s prognostication, one would think so.

But imagine a baseball player who hit a home run the last time he was up. Now he’s at bat again. Would you bet money on whether he will hit another home run? Not without a lot more information. After all, this guy may be a terrible batter. His home run may have been the first he ever hit. Or maybe he’s Babe Ruth. Without knowing his track record – as expressed in performance statistics – you don’t know. So you won’t bet. What is Mr. Moore’s record? In 2012, he predicted Mitt Romney would win. So now we’ve got two at-bats, with one hit and one miss. That’s still nowhere near what we need to meaningfully judge.

To be confident in our conclusions, we need to judge lots of forecasts under lots of different circumstances. That’s particularly important for judging probabilistic forecasts like “there is a 70 per cent chance Hillary Clinton will win,” which is roughly what Nate Silver’s final forecast was. When Mr. Trump won, many people said Mr. Silver was wrong. That’s not true. A forecast of “70 per cent Clinton” is also a forecast of “30 per cent Trump.” If Mr. Trump wins, there’s no way to know, on the basis of that one forecast alone, whether it was accurate or not.

But if you judge large numbers of forecasts you can judge. If 70 per cent of the events that are forecast at 70 per cent probability actually happen, and 30 per cent of the 30-per-cent forecasts happen, the forecaster is perfect. The further from this ideal he or she is, the worse the forecaster.

And to bring this all the way up to the scientific gold standard, there should be large numbers of forecasters making the same forecasts. That way there can be apples-to-apples comparisons of performance statistics and we can confidently separate the all-stars from the also-rans.

But outside the elite financial world – and even there it’s spotty – few forecasters have been put to the test this way. That’s particularly true of the pundits who dominate the media, where a single bull’s eye can turn an unknown into a highly paid Nostradamus.

This can change. We now have the tools needed to judge forecasters. Some major institutions – from intelligence agencies to Wall Street firms – have started to use them because they have realized it is ridiculous to make decisions on the basis of forecasts whose reliability is unknown. The media could do the same. And when a pundit goes on television, her forecasting stats could appear on-screen, just as a baseball player’s batting average appears when he steps up to home plate.

Maybe that sounds far-fetched. But delivering forecasts is a major part of what news organizations do and trust in that forecasting has been devastated. If the media want to restore trust, rigorous testing and independently calculated performance statistics would be a good way to do it.