Recent experiences with Murphys Law during high-stress situations have proved to be oddly reassuring.
Weve been working very hard on the Alternative Worlds package recently – hence the shortage of blog entries. After first exposing Alternative Worlds to the world a couple of months ago, weve built in some operations that streamline and simplify the user experience, whilst giving the site considerably more artificial intelligence. You can now write a simple table top emergency response exercise with two mouse clicks. (Actually thats not quite true, you have to give it a name, and set the start and finish times as well.)
I did a few critical demos last week to potential clients, with more to come. In two of them, a (different) error occurred, that Id never seen before. Luckily both were fairly minor and didnt spoil the demo. We tried to reproduce them as soon as possible afterwards – but could not. In one case, I think I made a simple operating error. The other was different: but all attempts to do it again failed, the programme working properly each time.
Bear in mind that this is a well-tested site. It has a suite of self-tests. (See this posting.)
My favourite test is the end to end test routine which looks at all the checklist links, and all the menu links – in fact every page any user might normally expect to generate. Each is remotely called over the internet as a logged in user, and then the results searched for a phrase youd expect to see there. In other words, every page anyone should be able to see is tested.
The database is tested. The data update, create, delete, etc operations are tested, using dummy data created for the tests and destroyed afterwards. Setup and security, access levels, menu generation, etc etc., are all tested. (See my book, chapters 8 and 9, for some of the testing code we use. I didnt go into ETE tests in the book because the code used is not a CodeIgniter feature. But writing web robots is pretty simple stuff if you use PHPs CURL package, or something similar.)
Its not exactly test driven development, but certainly we try to make unit tests follow very closely on the coding, just to make sure the code is doing what we think its doing, and then to ETE test all the end results.
But however good it is, demos are always a little stressful. The internet doesnt always work just as it should, for reasons beyond your control.
To make demos worse, Murphys law says that the error that gets through will appear when it matters most. This may partly be due to nervousness on the part of the demonstrator. You make simple errors. (I think the operating error I made on the first occasion was to forget to press enter, so of course the page didnt change as expected! The clients server was running quite slowly, so it wasnt immediately obvious that nothing was happening at all, as distinct from nothing happening quickly.)
Also, you are working on unfamiliar browsers and servers, and they may be set up differently to your development and test environments. In the book I mention a weird error that once happened to me. Id written a class which extended a base CodeIgniter class. Shortly after starting to write it, I changed it from one sort of class to another, but I forgot to change the reference to the parent class – so it was in the models folder, and set up as a model, but still said it was extending the controller class. Running on an Xampplite Apache development server, it worked fine. Transferred to an ISPs Linux box, it failed at once. I have no idea why this should be.
Finally we tracked down the second Alternative Worlds error. It was due to incorrect data entry. I have never claimed, as readers of this blog will realise, to be a good or accurate typist. But when the system is developing its own artifical intelligence, a small error in data can easily look like a programming failure: the action you expected doesnt occur. (As matter can transform into energy, so data transforms into actions.)
Incidentally, on the subject of testing, Ive found you can spend almost as much time on the tests as on the code. My heart sank recently when we ran the ETE tests on Alternative Worlds and many of the checklist tests showed failures. It turned out that, refactoring some of the code, wed changed a key phrase: it now has an upper-case initial letter. Of course, we had changed the phrase the test was looking for. A simple regex tweak solved the problem. For this reason we always write tests to return a lot of information if they fail, and none at all if the pass.
A complex software package is very like a living being. You need to watch it all the time. Hence all those tests, and a good automatic system for running them at set intervals. (Again, see the example I developed in the book.) You can never guarantee it will work perfectly. (See Microsoft for examples). You can however try to close each loophole as soon as you become aware of it, and you can test it regularly to try to spot problems before anyone else does.
I suppose the advantage of Murphys Law is that it reproduces the conditions under which real users use a package: stressed because it is new to them, using systems it wasnt designed on, making simple mistakes with it, and feeding it the wrong data. Perhaps I need to find an algorithm to generate Murphys Law, in order to prepare the ultimate test rig. First, prove it works, then see what you have to take away, where you have to be unfair, before it fails?