Programming Collective Intelligence

I recently bought a really interesting book which opens up some new horizons for the ordinary programmer. We are living in an exciting period: an intellectual version of the great ages of exploration, when ordinary seamen from Devon could sail off in simple wooden boats to find or colonise whole new worlds.

Someone kindly gave me a book token for Christmas, so I took it to a large bookshop in central London. Normally I buy computer books from Amazon – its almost always much cheaper. But its still nice to browse. I went looking for Jeffrey Friedls book on Regular Expressions, but it wasnt in stock.

So, in the absence of a long-tail environment, I was forced to limit my choice to what was actually there.

I toyed with the idea of something on MySQL5 (I really must get up to date on stored procedures. Plus this book has a section on GIS functions.)

I considered this one on XML and the DOM, which I should know more about.

Then I spotted Toby Segarans book on “Programming Collective Intelligence“. It just sort of leapt off the shelf into my hand and has been there ever since.

This book eplains in simple terms some of the more exotic algorithms used in sorting, classifying, identifying, clustering, etc. Whats more, it puts into your hands the code to do it. (Sadly hes written all his examples in Python, but they should be reasonably easy to translate to PHP). Ive always been fascinated by Bayesian statistics, genetic algoirthms, neural networks, and other such algorithms. Theyre all in here. Not just a simple clear explanation, but actual code.

Segarans book even includes two recipes for writing programmes to generate programmes, allowing them to grow by natural selection until the machine has built a programme that can solve a given problem.

When I blogged a year ago about Bart Koskos prediction that “Closed-form statistics also produced Bayesian models as a type of equation-based expert system where the expert can inject his favorite probability curve on the problem at hand. These models have the adaptive benefit that successive data often washes away the experts initial math guesses just as happens in an adaptive fuzzy system….[systems like this give>… a statistical free lunch except for the extensive computation involved—but that grows a little less expensive each day.”

To hold a simple version of the code for such things in ones hand is quite amazing. See Keats, On First Looking into Chapmans Homer:

Then felt I like some watcher of the skies
When a new planet swims into his ken;
Or like stout Cortez when with eagle eyes
He star’d at the Pacific — and all his men
Look’d at each other with a wild surmise —
Silent, upon a peak in Darien.

(Except, as PG Wodehouse pointed out, it was actually stout Balboa.)

As Ive often pointed out on this blog, the computational power, memory, and datasets needed to do this sort of thing are now more and more available to anyone who wants them. Where will this lead us? This is FUN!

Leave a Reply

Your email address will not be published. Required fields are marked *