Thanks again to Developing Intelligence for a link to a paper about the way our brains process information, which argues that it may not be optimal to calculate all possible correlations between many input signals, so that the brain may make assumptions. Is this also a constraint on the use we can make of massive databases?
The article, Neural correlations, population coding and computation by Bruno B. Averbeck, Peter E. Latham and Alexandre Pouget, is highly technical and Im not qualified to understand it in full. It looks at the way “…the brain encodes information in population activity, and how it combines and manipulates that activity as it carries out computations”, arguing that:
“As in any good democracy, individual neurons count for little; it is population activity that matters. For example, as with control of eye and arm movements, visual discrimination in the primary visual cortex (V1) is much more accurate than would be predicted from the responses of single neurons. This is, of course, not surprising. As single neurons are not very informative, to obtain accurate information about sensory or motor variables some sort of population averaging must be performed. ”
It has some interesting comments about how the binding problem might be solved at the neuronal level.
But the article is about correlations. Neurons present with a certain amount of noise. The question is whether this noise is random, or contains an extra level of signalling. At the moment we do not know, partly because the amount of computation involved is so great that we cant simulate it. To do it, we would need not only to average out the values returned by each neuron, but also to see if there was any pattern in those values. However, there is a possibility that the brain doesnt do this, either. “…measuring correlations is hard, and requires large amounts of data….. when choosing a strategy, there is a trade-off between performance and how much time and data one is willing to spend measuring correlations .”
Whether the brain does it or not is beyond my own competence. But this does make me think about the very large databses that are being assembled for specific purposes – eg the UK Police vehicle movements database. I assume that this will be used to look for needles in haystacks – ie amidst all this data, find me all movements for vehicle x over the last 6 months. Thats a simple (?) search engine job.
But if you want to use this sort of mass information for anything else, eg drawing wider conclusions, simulation, etc., just how far, computationally, can you afford to go? Even, for instance, working out marketing trends from a mass of POS data – is it possible to do this with confidence? Im reminded of the argument that you cannot measure the length of the coastline of Britain.