Hjem » Magical Thinking – Part 1: Cargo Cult Automation.

Forfatter: Richard Rostad, Promis Qualify

Cargo cults

During WWII, the US armed forces created a presence on pacific islands where the local population had little previous contact with foreigners. The soldiers built various military installations and airplanes would land with supplies. The vast amounts of military equipment and supplies that both sides airdropped (or airlifted to airstrips) to troops on these islands meant drastic changes to the lifestyle of the islanders.

With the end of the war, the military abandoned the airbases and stopped dropping cargo. In response, charismatic individuals developed cults among remote Melanesian populations that promised to bestow on their followers deliveries of food, arms, Jeeps, etc. The cult leaders explained that the cargo would be gifts from their own ancestors, or other sources, as had occurred with the outsider armies. In attempts to get cargo to fall by parachute or land in planes or ships again, islanders imitated the same practices they had seen the soldiers, sailors, and airmen use. Cult behaviours usually involved mimicking the day-to-day activities and dress styles of US soldiers, such as performing parade ground drills with wooden or salvaged rifles. The islanders carved headphones from wood and wore them while sitting in fabricated control towers. They waved the landing signals while standing on the runways. They lit signal fires and torches to light up runways and lighthouses.

In a form of sympathetic magic, many built life-size replicas of airplanes out of straw and cut new military-style landing strips out of the jungle, hoping to attract more airplanes. The cult members thought that the foreigners had some special connection to the deities and ancestors of the natives, who were the only beings powerful enough to produce such riches.
And yet, despite the islanders doing everything right, the planes didn’t return and the cargo didn’t drop. The physicist Richard Feynman drew parallels to pseudosciences like parapsychology and homeopathy in his 1974 commencement speech at the California Institute of Technology: They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.
I believe there is similar magical thinking in testing, where the appearance of good testing is strong, but the results are not. I’ll try a few examples.

Cargo cult automation

 A common statement is something like this:
“We can improve system quality with automated tests.”
There are many problems with this statement. First of all, we’re probably not employing Weinberg-style whole systems thinking and talking about automated document reviews to make the requirements and design documents better. By “system quality”, we’re using a much narrower definition which for all practical purposes equals code.
“We can improve program code with automated tests.” 
is therefore a much more correct statement. But what about the word “automated”? Don’t we just mean programmed? Our statement now becomes:
“We can improve program code with programmed tests.”
Which just means
“We can improve program code by writing more code.”
And we’re not going to waste our programmers time on testing, are we? No. So we’re really saying:
“We can improve our code by having non-programmers write more code.”
Somehow, I feel that the initial line is more convincing than the last one, while the last one is far more like what we are actually doing. This does not mean that automation is not useful, only that you need to get all the things right for it to work and not only the appearance of it all. For instance, it can be useful to figure out why you automate. Is it to repeat tests you’ve already done, or because you’re using the computers ability to do many things fast and repetitively? These are fundamentally different purposes.
Another thing is precision, accuracy and frequency. Unit tests typically have high precision in that they point at one line that is returning a wrong result. They also have high accuracy in that (at least if we’re using TDD) they usually point at something that is actually undesirable. Since they are written and used during periods of development, they are run with high frequency, usually on every build which is probably several times a day.
On the other extreme, we have full system tests. Such tests typically have fairly low precision; they would typically point to a large block of code that is not doing what we expected and much investigation is necessary before we can find out what is actually wrong. They also have low accuracy in that we usually don’t know if the problem is in the test, the test framework, the test data, or if the test fails because of a change that we actually want and the test is just out of sync with the system.
Then there is cost.
Those of you who learned to program in the seventies or early eighties, you may remember an idea called literate programming. This was introduced by Donald Knuth who is one of the great minds of computer science. The basic idea is that you write comments in a natural language, explaining what you are doing, why you are doing it, what alternatives you have considered and discarded and why. Then you add a few lines of compilable code that does what you have described and go on to explaining the next bit. This was a fantastic idea because it forced the programmer to think before coding which, in my personal opinion, is something that should be considered more often.
Sadly it was discarded because all the thinking and writing took time away from coding. Even though programmers who did take the time were more productive than the ones who did not.
TDD and unit tests have much the same effects in that by writing the test first, the programmer is forced to think explicitly through what he wants a function to do before writing the actual code.
Therefore automated tests on the unit level have negative cost.
Automating a complex, system-level test that has previously been performed by a thinking person does not have negative cost.