How Mean Are the Times Between Failures?
Or what put the shit in the machine?
Warning: This post is not for IT geeks. It is for normal people who would like to know why computers are so annoying. Hopefully it will also help users to empathize with these poor, neurotic simpletons without which (whom?) we would have to watch TV, walk in the countryside or, shock horror, talk to people.
Anyone who has worked intimately with digital computers, knows they are innately perverse and stupid. By worked intimately I mean inside them; inside their guts and inside their minds.
That all computers have guts is probably not controversial. That they have minds probably is. We’ll do guts first. We speak of digital computers only. Analog and quantum are for historians and futurists respectively.
Guts AKA Hardware
Hardware is all the physical stuff that sits on your desk, lap, palm of your hand or whatever eccentric rig takes up space in your domain. This stuff, whose guts, for all its complexity when viewed with the mark-1 eyeball, microscope or x-ray machine, is capable of very little. It can accept and store inputs, do simple arithmetic, store results and keep outputs for retrieval - that’s it and that’s all. Suspend disbelief for a few minutes while I convince you - or not.
Interfacing (that’s geek for talking to or dealing with) your computer may seem complex, involving keyboards, storage devices,1 microphones, speakers, screens, printers or what have you. That makes not a whit of difference to the essential simplicity of what is going on. Nor does the exquisitely complex and high tech mother board, AKA electronic brain, with its millions of bits of stuff and furlongs of gossamer-fine conductors, elevate the capabilities of the thing even slightly. Miraculously, the actual computer, the thing that does all the magic, does what it does with digits and only digits. Symbolic data like decimal numbers (yes, those are symbols to your digital buddy), alphabets, graphics, sounds, indeed all the stuff that humans process effortlessly using their senses, have first to be converted to digits, manipulated and re-converted at the output end.
If you are thinking “what about Boolean logic - computers do do that all the time, do they not?” True enough but what does that mean? It means the ability to compare two numbers as in, are they equal, is one less than the other (which automatically makes the the other number greater), or is one greater than the other, but I repeat myself. That can all be done in a single operation by subtracting one from the other. Modern computers take short cuts to do Boolean by comparing bits - are they the same AKA “AND”, are they different AKA “OR” , or are they both empty AKA “NOR” - but it amounts to the same thing. Don’t get me started on bits.
So, the hardware is not just simple, but blindingly stupid as well. It’s a wonder that the poor things are useful at all. “But but but but” I sense someone thinking, “computers calculate pi to gazillion decimals, process complex statistics. Engineers use them to design aircraft, bridges and submarines. Architects design great structures, molecular biologists model life process, meteorologists do weatherer forecasting. ”
I understand how you feel. It must be like discovering your very smart puppy has an IQ of 10. Nevertheless, computers do all that but but but but stuff the same way I do cooking - the hard way. Only very very fast. Blazingly, impossibly, unbelievably fast but still stupid.
Programming And Memory Before The Fall
At the dawn of the digital computer age - yes, I was there - programs were hardware. You saw that correctly. Programs were hardware, meaning that computer programs were nothing but powered switches, wired together. The technology was mostly telephone technology, like relays and rotary switches, with a smattering of radio stuff like vacuum tubes.
To change the program, rewire the switches - simple but a PITA. Even plugboard programming did not help much. That put a crimp in computer utility, compounded by the scarcity and expense of storage.
Did I mention that memory and storage was also, wait for it, relays and rotary switches? Expensive, limited and used exclusively for holding parameters and storing intermediate results needed by later steps in the program. I won’t put readers through the history of all the Heath Robinson attempts to solve the memory and storage problem - vacuum diodes, magnetic cores, magnetic drums, magnetic tape anyone? Just be grateful that we now have gigabytes of RAM and terabytes of solid state storage - and it’s still not enough.
Charles Babbage might have cancelled Lady Lovelace for ten kilobytes of storage. You may imagine what Alan Turing might have done for a megabyte.
Software is Mind-blowing Ware
Turing and others, who just could not leave bad-enough alone, came up with the idea that programs could be dynamic if put on punched paper tape - provided that someone could come up with a gadget to change the direction in which tape was read, as dictated by instructions punched on the tape. If that boggles your mind, join the club. Thus was born the idea that programs might be structured dynamically, if the instructions could be stored and retrieved as needed. At roughly the same time, the need to store more numbers while solving problems, was becoming desperate, thus spurring rapid development of large arrays of electronic and electromagnetic storage devices.
As a hangover from the days when programs were hardware, data and programs were at first, treated as different beasts because program instructions were formatted to suit the hardware’s instruction set, while numbers were formatted to suit the needs of arithmetic, logic and symbols.
A brief digression to explain about instruction sets, if I may. Because hardware programs were so excruciatingly slow, computer engineers were under pressure to speed things up. They did what they could by getting the computer to do more with each cycle of it’s internal clock. Think of the computer clock as the thing’s heartbeat. The result was that engineers then got into an arms race by designing ever more elaborate computer instructions. Given that this did not improve their ability to do more than simple arithmetic, what it amounted to was that each computer cycle now tried to do more than one arithmetic (or logical) operation within a single cycle of the clock. This would turn out to be a bad idea, for reasons that will have to wait until we have explained what happened next.
What happened next was that some evil genius or committee of demons, came up with the idea that programs could be data and data could be programs.
The Fall From Grace
As a French computer engineer of my acquaintance put it, zat poot ze schitt in ze ma-sheene.
Suddenly it was theoretically possible to give computers minds of their own. Soon this would cause perverse to cohabit with stupid in all computers.
Theoretically possible because the hardware manufacturers were locked in combat with each other and there was no such thing as a standard instruction set. So what if programs could be data, the data still had to be turned into something that the machine could “understand” or more likely, “misunderstand”.
Guess what? Riding to the rescue came another band of demons with the idea of Standard Programming Languages which then needed compilers or interpreters. Naturally, standard meant not quite standard because of different computer architectures.
Having thus over-complicated the lives of poor, simple hardwares, yet another problem bit the geniuses in the arse. Inquiring minds of the time wanted know, “what if I want to run a program written in Fortran at 08:00 in the morning and another written in C in at 08:17?” The first answer was “easy, load and run your Fortran starting at 07:58, then load and run your C starting at 08:15”.
That was about as popular as a fart in church. After all, both programs would (sensibly) be precompiled and stored in all that lovely memory until needed. Why on earth not have a “supervisor program” load and run the darn things to a schedule as and when needed and why had the geniuses not thought of that in the first place? Well, actually, the bastards had been thinking about it.
Thus was born yet another abomination, the Operating System or OS. Operating systems are, you will be pleased to know, just more programs that clutter up memory. They are also written in programming languages and are also specific to each different hardware architecture. To make matters worse, some modern OSs have even partially reverted to hardware-as-software. Sorry; we are not going there right now, lest we be here all month.
Did I mention that almost all programming languages come in different dialects? In that case you are not surprised to know that all compilers cannot understand all dialects, but might compile them anyway, which can be hilarious entertainment for onlooking sadist-programmers.
To complete the nightmare, software comes in versions that need updating or upgrading without notice, to deal with problems you do not have until the update/upgrade turns up - this is a common source of stress knowns as Computer Update Stress Syndrome (CUSS). If your OCD compels you to install all new versions automatically, you will quickly learn the error of your ways - never be the first to upgrade to a new version.
Recall how computer designers got into an instruction set arms race in an attempt to make computers do more within each clock cycle? It turned out to be a bad idea because it overcomplicated computer circuitry and pissed off programmers who had to contend with a different instruction set for each computer architecture. Most programmers had by then entirely given up on programming in machine language, because of the handy higher level languages they could play with, but the ones who wrote compilers still had the problem of dealing with a plethora of elaborate and different system instruction sets. They dealt with it by stealthily ignoring the more elaborate instructions and compiling programs to use only the most simple ones, like + - x ÷. Manufacturers, blissfully unaware, continued to brag about their powerful instruction sets. Customers, just as unaware, were impressed, or not, depending on who offered the biggest discount to get in the door.
Eventually the industry noticed that the sheer cost of blindly adopting these innovations was causing pain, mostly in the wallet. After giving the compiler programmers a brisk bollicking for not saying anything about their workaround, the industry pretty quickly settled down to building drastically fewer different architectures based on something called RISC - Reduced Instruction Set Computers. Thus RISC made computers even stupider at the same time as they became less diverse.
Something like our universities.
Allow Me To Summarize All That Long Winded BS
Digital computers are stupid and very fast. This means they often produce stupid results very quickly.
To make all computers that are still in general use today usable for a broad range of problems whether scientific, commercial or merely trivial, we need at least 350 programming languages. That’s my estimate. Wikipedia thinks I underestimate badly.
Any given architecture may not be using each and every one of those languages, but must be able to do so on a corporate manager’s whim and my, do they have whims.
That means we need a compiler/interpreter for each language and each hardware architecture and version thereof. Yes, I am sorry to say, hardware caught the version disease, they just name them models.
To use all that semi-efficiently, we need an operating system (OS) for each hardware architecture. See models above
To keep lots of programs and data readily available, we need to cram them into lots of storage
Because programs can be data and data can be programs, both have to use the same storage cheek by jowl, whether they get on with each other or not.
The last bullet marks The Fall From Grace AKA when zat poot ze schitt in ze ma-sheene.
So why the insults?
Before The Fall
To answer, I first drag you back to the beginning, when programs were hardware. I failed to mention that back then, hardware was very unreliable. Even the so-called supercomputers of the era, despite having been upgraded with more reliable circuitry in the form of semiconductors, still had mean times between failures (MTBFs) measured in hours. The result was that if one wanted to run a weather forecasting model for example, the model had to run to completion under two constraints:
The model should have produced a forecast a few hours or so before the weather turned up outside and
The computer better not crap out before the model runs to completion, especially not close to the end of the run.
In those days, a typical weather model on a typical supercomputer ran for up to 3 hours.. Obviously an MBTF of 2.5 hours would be disastrous.
However, in a relatively short time, reliability problems shifted from hardware to software. Today, the hardware fails rarely. Faults caused by bad components or manufacturing defects are usually caught before the equipment leaves the factory and after delivery, most modern hardware will run for years without failing. Even then, most failures are caused by stress due to overheating rather than age or wear and tear.
After The Fall
The fall made everything too easy for programmers. Abuse and misuse of all the wonderful toys that IT technology was serving up, became the norm. Result -software proliferated, became complex, bulky (hence bloatware) and multi-layered. There is no longer a practical way to test a moderately complex program suite for every possible combination and permutation arising from different data, different interactions, different dependencies and different cases. It’s so bad that we are unable to even estimate the numbers of permutations and combinations involved. Even computer bugs (short for coding mistakes or logic errors) can persist for years, resulting in what appear to be random failures that disappear when the same program is re-run. This can happen because different program runs can force the associated programs to take totally different paths through their logic. Only one or two, rarely used paths may be the culprit.
The industry is at the point where most of the mission critical code, without which our daily lives would be severely disrupted, cannot be understood by any single person. A great deal of it is written in programming languages that have fallen into almost complete disuse, so that if something breaks, the chances of finding a programmer who can fix it are slim. Better to recompile and hope for the best - if you can find a compiler that still runs on your kluge2 of a setup.
Nowadays we just expect that hardware almost never breaks and that software crashes regularly. Only newbies call tech support - veterans just restart after a crash because it works almost always. We also know that tech support doesn’t, so only call them in extremis.
Now you know why it’s like that - the hardware is too stupid to know better and the software systems are perverse, schizoid and potentially psychopathic from the pressure of trying to do everywhere at once, in a vast labyrinth of impenetrable gobbledegook lurking in computer memory, all of it coded in 1s and 0s.
The single redeeming fact is that our computers are now so fast and have so much capacity, that we no longer need to worry much about MTBFs but we really should curb our tendency to curse them.
This post does not differentiate between storage and memory. The difference only amounts to the need to swap between expensive rapid access- and inexpensive bulk-storage. This is not a material difference for the purposes of the post.
Kluge - An ill-conceived collection of ill-assorted parts assembled into a distressing thing.
I never read anything about ze ma-sheene except when I absolutely have to, however, I really enjoyed this tale of woe from Ancientte Phartte; now I’m less inclined to hate its guts.