Why you can trust the census, but not polls

Why you can trust the census, but not pollsIn 1936, a massive poll of 2.3 million Americans revealed that the forthcoming presidential election would be won in a landslide by Alf Landon. Alf, who? Right. There was indeed a landslide in 1936, but the winner was Franklin Delano Roosevelt.

Yes, this is ancient history. But bear with me. This famous poll is a standard illustration in Stats 101 classes. With the census controversy continuing to dominate the front pages of newspapers, we could all use a little Stats 101.

It’s hard to overstate just how mammoth that famous poll was, particularly in an era before computers. Conducted by The Literary Digest, a popular magazine, the first step was to gather an astonishing 10 million addresses from automobile registration lists and telephone books. Ballots were mailed to every one of those addresses. Roughly 2.3 million were returned. When the numbers were added up, it was Landon in a landslide. And with a survey that big, how could it be wrong?

And yet it was wrong. For two reasons.

The first problem was the sample. Not every American owned a car or a telephone. By drawing addresses from those sources, the magazine underrepresented poorer Americans, and poorer Americans tended to vote for Roosevelt. This is a straightforward example of “sample bias.”

The other problem was the response rate. It was 23 per cent. By the standards of such surveys, that’s high. But it still meant that only a minority of those who got a ballot returned it. If those responders had been chosen randomly, it may not have been a problem. But they weren’t randomly selected. They selected themselves. And as it turned out, they tended to be wealthier. And much more likely to vote for Alf Landon. This is “response bias.”

Combine these two sources of bias and you get The Literary Digest’s disastrous poll. Which brings me back to the census.

In order to get a good sample, the short-form census goes to every Canadian. No room for sample bias there. The long-form census goes to only 20 per cent of households but the threat of sample bias is minimized by randomly selecting who will get it.

That still leaves the risk of response bias. Dealing with that means boosting up the response rate as high as possible.

In the past, Statistics Canada has done that by appealing to people’s sense of civic duty. And by pestering them with phone calls if they don’t return the form. And, at least in theory, threatening them with fines and jail time — although, in practice, fines are rare and no one has ever gone to jail for refusing to answer the census.

It did the job. The response rate for the 2006 short-form census was 97.2 per cent. For the long-form census, it was 93.7 per cent. Those figures are spectacularly high — and that is a big reason why the data produced by the census are, although not perfect, about as accurate as they can be.

But the government, in its wisdom, decided that this is an unconscionable abuse of the power of the state — unlike the dozens of other abuses of state power the government finds entirely conscionable — and so, without the slightest consultation, it ordered Statistics Canada to make the long form voluntary. Lots of research, including recent American experience, suggests this will result in a significant decline in the response rate, particularly among marginal groups.

Will that mean a decline in the accuracy of the data? Maybe. A declining response rate raises the likelihood of bias. But recent research suggests it doesn’t make it certain. The only way to know for sure would be to conduct a voluntary and a mandatory census at the same time. That ain’t going to happen. So the net effect of the switch to a voluntary survey will be to cast doubt over the data. Are they biased? In what ways are they biased? To what degree? We won’t know.

The government has conceded this point but it insists that the problem can be solved by sending the long-form census to more Canadians. Industry Minister Tony Clement even boasts about the millions of forms that will be sent out. But as every Stats 101 student knows, increasing the number of forms may increase the number of responses but it won’t increase the response rate. And so it won’t stop response bias — as the editor of The Literary Digest discovered shortly after he boasted about how big his survey was.

But hang on, the government’s defenders say. People quote data from voluntary surveys all the time. Every opinion poll that appears in the media is voluntary. How can that be squared with the insistence that the census must be mandatory to be as accurate as possible?

The short answer: It can’t.

“Opinion polls are generally trash,” Ron Melchers, a statistician at the University of Ottawa says. Response rates on private surveys are low and falling. It’s not uncommon for four out of five people to refuse to answer. Or more. That creates huge opportunities for bias to creep in. “My aged mother will answer any survey she’s asked to answer. She’s isolated, she’s alone, she’s grumpy and irritable, and she hasn’t got all of her marbles. These are the people who are most likely to respond to a lot of these polling surveys,” says Melchers.

StatsCan conducts voluntary surveys but it uses the census data to “weight” the results — that is, to adjust it and correct for bias. Private pollsters generally don’t or can’t do that. And soon, thanks to the government’s decision on the census long form, neither will StatsCan.

All of this means that getting an accurate picture of reality — as represented by numbers — can be a lot harder than most people realize. Two conclusions follow.

One, we should be much more skeptical about the numbers that are used so casually in public discourse. “Often people using numbers have no idea where those numbers come from,” Melchers says.

The other is that the census is unique. And uniquely valuable.

Or at least it was until Stephen Harper came along.