Published in the Globe and Mail (Toronto), November 23, 2016
Dan Gardner and Philip E. Tetlock
It seems almost the entire pollster-and-pundit class got the U.S. presidential election wrong, leaving millions of people feeling shocked, even betrayed. Worse, they feel afraid. There’s so much uncertainty about a Trump administration and people are desperate for insight about what’s coming next, but those who should be able to tell them just failed miserably.
Who can we turn to for forecasts we can trust? That’s not easily answered. Judging even a single forecast is harder than it seems. And judging forecasters is exponentially harder.
But imagine a baseball player who hit a home run the last time he was up. Now he’s at bat again. Would you bet money on whether he will hit another home run? Not without a lot more information. After all, this guy may be a terrible batter. His home run may have been the first he ever hit. Or maybe he’s Babe Ruth. Without knowing his track record – as expressed in performance statistics – you don’t know. So you won’t bet. What is Mr. Moore’s record? In 2012, he predicted Mitt Romney would win. So now we’ve got two at-bats, with one hit and one miss. That’s still nowhere near what we need to meaningfully judge.
To be confident in our conclusions, we need to judge lots of forecasts under lots of different circumstances. That’s particularly important for judging probabilistic forecasts like “there is a 70 per cent chance Hillary Clinton will win,” which is roughly what Nate Silver’s final forecast was. When Mr. Trump won, many people said Mr. Silver was wrong. That’s not true. A forecast of “70 per cent Clinton” is also a forecast of “30 per cent Trump.” If Mr. Trump wins, there’s no way to know, on the basis of that one forecast alone, whether it was accurate or not.
But if you judge large numbers of forecasts you can judge. If 70 per cent of the events that are forecast at 70 per cent probability actually happen, and 30 per cent of the 30-per-cent forecasts happen, the forecaster is perfect. The further from this ideal he or she is, the worse the forecaster.
And to bring this all the way up to the scientific gold standard, there should be large numbers of forecasters making the same forecasts. That way there can be apples-to-apples comparisons of performance statistics and we can confidently separate the all-stars from the also-rans.
But outside the elite financial world – and even there it’s spotty – few forecasters have been put to the test this way. That’s particularly true of the pundits who dominate the media, where a single bull’s eye can turn an unknown into a highly paid Nostradamus.
This can change. We now have the tools needed to judge forecasters. Some major institutions – from intelligence agencies to Wall Street firms – have started to use them because they have realized it is ridiculous to make decisions on the basis of forecasts whose reliability is unknown. The media could do the same. And when a pundit goes on television, her forecasting stats could appear on-screen, just as a baseball player’s batting average appears when he steps up to home plate.
Maybe that sounds far-fetched. But delivering forecasts is a major part of what news organizations do and trust in that forecasting has been devastated. If the media want to restore trust, rigorous testing and independently calculated performance statistics would be a good way to do it.