The rules of modelling

Thanks to Financial Cryptography for pointing out the Economists coverage of a research paper by Thomas Griffiths and Joshua Tenenbaum, about the way people think ahead. They argue that humans have internalised Bayesian methods, which allow us to make predictions when which we have insufficient evidence. Seems to me this is critical to the nature of modelling.

Bayes invented a statistical method for drawing inferences from limited datasets. The Economist says that his ideas “were eventually overwhelmed by those of the “frequentist” school” – by which they mean drawing inferences from very large statistical sample populations. Frequentism doesnt care how the answer was reached: it just presents the facts. But of course it only works if the facts are there in sufficient amount to be counted.

Im surprised the Economist thinks Bayes has been overwhelmed – financial engineers make a lot of use of Bayesian statistics. You obviously cant use frequentist statistics to predict where the stockmarket will be in a weeks time, becaust that problem extends over time. You can use frequentist statistics to analyse how people will vote in an election, on the assumption that what they say today is how they will vote next week – although we can all think of examples where frequentists predicted elections wrongly. (And to be fair they would not talk about predictions without qualifying them with confidence levels.)

Griffiths and Tenebaum (G&T) performed an experiment in which they invited humans to make predictions based on limited information. (eg: “If your friend read you her favourite line of poetry, and told you it was line 5 of a poem, what would you predict for the total length of the poem?” or “If you heard a member of the House of Representatives had served for 15 years, what would you predict his total term in the House would be?” )

Clearly these are impossible questions, but they are no worse than the ones which face, say, people writing an emergency simulation: “Given that Hurricane Katrina devastated New Orleans, how should we react to major flooding in London?” They are also analogous to the questions facing insurers underwriting unusual risks (eg product recall insurance, where the frequency of product recalls is not enough to give a frequentist prediction with any adequate level of confidence).

G&T found that “People’s judgments for Life Spans, Movie Runtimes, Movie Grosses, Poems, and Representatives were indistinguishable from optimal Bayesian predictions based on the empirical prior distributions…”

The paper is hihgly technical but draws a conclusion that: “The finding of optimal statistical inference in an important class of cognitive judgments resonates with a number of recent suggestions that Bayesian statistics may provide a general framework for analyzing human inductive inferences. Bayesian models require making the assumptions of a learner explicit. By exploring the implications of different assumptions, it becomes possible to explain many of the interesting and apparently inexplicable aspects of human reasoning ….. The ability to combine accurate background knowledge about the world with rational statistical updating is critical in many aspects of higher-level cognition.”

Note the point about making the assumptions of a learner explicit. This is the essence of modelling. G&T assert that people unconsciously select one of a series of distributions eg normal, Poisson, etc., on which to base their extrapolations.

G&T also add that “These results suggest that people’s predictions can also be used as a method for identifying the prior beliefs that inform them.” In other words, if you look at what we think will happen, you may be able to work backwards to a series of modelling assumptions which may be quite realistic.

The Economist finishes its article: “some people suspect that the parsimony of Bayesian reasoning leads occasionally to it going spectacularly awry, with whatever process it is that forms the priors getting further and further off-track rather than converging on the correct distribution. That might explain the emergence of superstitious behaviour, with an accidental correlation or two being misinterpreted by the brain as causal. A frequentist way of doing things would reduce the risk of that happening. But by the time the frequentist had enough data to draw a conclusion, he might already be dead.”

Of course the other problem is that the question being asked might not make any sense. What is the difference between:

– If your friend read you her favourite line of poetry, and told you it was line 5 of a poem, what would you predict for the total length of the poem?

and

– If your friend read you her favourite line of poetry, and told you that she first read the poem on a Tuesday, what would you predict for the total length of the poem?

In other words, are the modelling assumptions actually relevant?

Leave a Reply

Your email address will not be published. Required fields are marked *