How much information? for how long?

Ive just found the UC Berkeleys School of Information Management and Systems (SIMS) report which tries to quantify the information we generate. The results are fascinating: we are generating more and more information. Are there limits to this?

In 2002, “print, film, magnetic, and optical storage media produced about 5 exabytes of new information.” (1 exabyte = 1000 petabytes = 1,000,000 terabytes.) SIMS estimate the total number of words ever spoken by human beings at 5 exabytes. “We estimate that new stored information grew about 30% a year between 1999 and 2002”.

“Most of the total volume of new [electronic> information flows is derived from the volume of voice telephone traffic”. The internet comes second, but a long way behind. Of course this was before VOIP became widely available.

Of course, this explosion of information is partly to be expected, given the exponential growth in the number of humans generating it. (The suggestion that there are more humans alive now than in the entire previous history of the planet seems to be wrong – see this discussion, but there are certainly a large percentage of us alive now (6 billion?) out of the total who have ever lived (40-70 billion?).

At the same time, huge amounts of information are being generated automatically. To return to a favourite theme, the British Police database of vehicle movements is just one example of a system that will generate massive amounts of data: 100 million reads (each a minimum of vehicle number, time and location spotted.

(Thats 36,500 million reads per year. If each is 100 bytes thats 3,650,000 million bytes, or 3.6 terabytes, per year: 18 terabytes over 5 years, which is how long they intend to keep the data. Then add in the mobile phone company databases, the corporate databases tracking supermarket purchases (loyalty cards) etc.

Simulations are also largely to blame: realistic simulation requires enormous amounts of information, whether it is graphics generation, or (say) a 3D oil well reservoir seismic model. Or take the Earth Simulator.

Just a thought. At the moment we use information lavishly, like Americans used to use oil before the 1970s. Remember those beautiful 6 litre V8 engines? Information seems to be an endless resource, easy to acquire, virtually costless to replace or store. (See this earlier posting.) There are no environmental implications to creating or keeping it. (Plenty more silicon in them thar hills…). Plus we have faster computers to search/ re-purpose/ munge the data for us.

But history does seem to show that, whenever we perceive an infinity of possibilities, some counterbalancing factor is revealed. Apart from the US and oil, look at the scramble for Africa in the 1880s – 1890s, the Spanish dependence on colonial gold in the 1500s. Or take Concorde and manned space exploration in the late 20th century: the limits to moon landings and supersonic flight were not technical, but just that we didnt want to do it so much for other reasons.

How many times, for instance, will the British Police access their 18 terabytes of data? Assuming ten routine queries a day, thats 3,650 queries per year. According to SIMS, the Library of Congress print collection is about ten terabytes. This is the equivalent of a mere 2000 people using the Library of Congress each year. Put that way, it looks as wasteful as driving your gas guzzler down to the local shop, instead of walking.

The ultimate cost may not be in physical resources (energy, etc.) but in the demands this stuff makes on our attention. After all, if you collect it, do you generate an obligation to use it?

Meanwhile, the implications for simulation are that almost everything is possible and almost everything you need is getting cheaper and more accessible.

Leave a Reply

Your email address will not be published. Required fields are marked *