Polls, data and forecasts.

As the Duke of Wellington reportedly said: “All the business of war, and indeed all the business of life, is to endeavour to find out what you don’t know by what you do; that’s what I called “guessing what was at the other side of the hill.”

Guesses have now been completely wrong in two high-profile events (the US Presidential Election and Brexit.) Does this mean it can’t be done, or just that we are doing it badly?

The Independent ran a story on 7 November headlined “Can Donald Trump win the election? Here’s the mathematical reason why it’s impossible for him to become President”, by an academic, Bryan Cranston, of Swinburne University, Melbourne. He claims ” We can … ignore the polls and instead look at the results of the Electoral College from the last four elections for a good indication of what the result will be.”

The Independent ran an earlier story, on 18 October, also by Mr Cranston, headlined: “Want to know who will win the US election? Ignore the polls and follow the money”, which examines the war chests raised by each candidate (apparently $516 million for Clinton, $205 m for Trump, as of 21 September 2016) and concludes: “The only polls to watch are candidate balance sheets, and whether people are actually donating money. It is clear that the financial backers have spoken, and it does not bode at all well for Trump.”

But even if you ignored Mr Cranston’s advice and followed the polls, it didn’t help. Such services as the Huffington Post ‘Pollster’, run just before the election, claims to track 376 polls from 43 pollsters, and as of a few days before the election says: “Our model of the polls suggests Clinton was very likely leading. (In >99% of simulations, Clinton led Trump.)”.

According to Media Guido, the Guardian prediction was “Hillary Clinton Will Win” (Martin Kettle – Associate Editor); the Telegraph “Hillary Clinton Will Win the Presidency” (Janet Daley) and the FT “Trump Will Lose, Can the Republicans Recover?” (Roger Altman)

In a further article, Mr Cranston admits he got it wrong, pointing out that almost everyone else did too. He was right to ignore the polls, he argues, but he adds: “I arrived at my forecast based on a different type of modelling. I do not look at opinion polls; rather, I look at historical election results as a predictor of what might happen next…By looking at the last six presidential elections since 1992, we made what we thought were reasonable assumptions about how particular states might vote. By looking at which states had voted for the Democratic Party …..By factoring in states that each party always won, added to those they won most of the time….” (Well, no, actually, one week you looked at how much money each candidate had raised, and the next at the mathematical balance in the Electoral College. And basing your prediction on extrapolating the past is notoriously subject to error.)

But Mr Cranston, like many pundits, also adds a further explanation: the voters got it wrong. “This election campaign has done more than elect Trump. It has shown that voters are no longer interested in facts or logic when it comes to exercising their democratic right…..[I] didn’t take into account the true impact of America’s racial and ethnic divide.” Leaving emotion aside, the point surely is that the polls did not take this into account either. If these divisions exist, the polls should pick them up. Either the polls measured opinion wrongly, or public opinion is more fickle than we thought: say one thing to a pollster this week, do something else entirely next week. Either way it was not a good week for Big Data.

Of course, the forecasters were all wrong. What does this show?

1. These days, it is possible to get staggering amounts of data. (376 polls suggests to my naive statistical mind that at least 376,000 people were questioned, and the results collected and analysed, in a very short period of time.)

2. It is possible to manipulate data in many ways, but most of these ways are probably misleading. (Averages, projections, breaking the data down into subsets, eg by demography or geography.)

3. All forecasting is dangerous. An article on Prediction in the Oxford Handbook of Philosophy, by Professor Nicholas Rescher, argues that most philosophers take a position between a Laplacian cosmos, in which everything is absolutely determined, and a chaotic cosmos, where everything is random and ‘all apparent patterns are at best transitory stabilities’. This midway position assumes that “the real world admits of rational prediction in many cases, but with many important exceptions, particularly relating to chance (stochastic) events… and to the spontaneous decisions that manifest the ‘free will’ of human beings”. (p 751).

4. There ought to be a website which takes highly publicised forecasts, checks them for accuracy when the event occurs, and rates the forecasters accordingly.

This area has already been addressed in a study commissioned by IARPA, leading to the publication of Phillip Tetlock’s book, Superforecasting. Tetlock found that some people had an ability to achieve consistently higher accuracy than others, and that this gap grew as they gained practice. The ‘superforecasters’ were not professionals, nor (in most cases) specialists or particularly extraordinary in any way.

Tetlock used the Brier Score, proposed by Glenn Brier, and since developed, eg by Hernandes-Orallo et al for use with machine classifiers. Hernandes-Orallo clearly see the use of their work being to “play a role in the improvement of classifiers, especially in terms of calibration.”

Sadly newspaper pundits and polls are not retrospectively calibrated: but they may have a considerable role in forming opinions and influencing strategy. (see this posting on the Clinton campaign’s use of poll data.) Tetlock’s data also suggested that ‘the more famous an expert was, the less accurate he was’ (p72), which gives one an added reason to be suspicious of newspaper pundits, even if they are academics.

Tetlock’s book develops a set of ‘commandments’ for training the ability to forecast. He also includes a theoretical discussion, too long to summarise, quoting Kahneman, Taleb, and Isaiah Berlin. None of his commandments, incidentally, involve getting bigger data.

As Rescher says, “the key role of prediction in human affairs inheres in our stake in the future.” We need “some degree of cognitive control over the future..” if we are to exist. So much hangs on the outcome of an election – for countries, markets, for individuals, for politicians themselves and their hangers-on – that there will always be a demand for the Bryan Cranstons of this world. The more definite and authoritative they sound, the better, until they are proved wrong that is.

Leave a Reply

Your email address will not be published. Required fields are marked *