Patriot Missile Failure
CSC385 – Spring 2009
On February 25, 1991 a Patriot missile battery failed to intercept an incoming Scud missile aimed at an army barracks in Dhahran, Saudi Arabia. When the Scud missile hit the barracks 28 soldiers lost their lives and 99 were wounded. This was the largest single loss of life among coalition forces in that war.
The General Accounting Office (GAO) was called upon to investigate the incident. Their report pointed to round-off error being a key causative factor.
The Human Side of the Incident
The 14th Quartermaster Detachment was an Army Reserve water purification unit from Greensburg, Pennyslvania, a town of 18,000. On February 19, 1991 69 members of that unit arrived in Saudi Arabia. Six days later, in the Scud missile attack, 13 were killed and 43 wounded – an 81% casualty rate.
Once word of the attack reached Pennsylvania, the 99th Army Reserve
Command (ARCOM), parent unit of the 14th, began a 24 hour-a-day vigil at
the Greensburg Reserve Center to assist family members in their pain and
grief. The 99th ARCOM and the 1st Army set up a casualty assistance center
in town manned with chaplains, counselors, social workers and representatives
from several federal agencies. They also assisted family members with visits
to wounded soldiers at Walter Reed Army Medical Center. Local citizens
unselfishly volunteered to assist in these efforts.
Pennsylvania's governor declared a week of mourning and ordered flags
on all state buildings to be lowered to half mast.
A community memorial service was held on March 2, 1991. Over 1,500
citizens attended. [http://qmfound.com/14th_Quartermaster_Detachment.htm]
The point we make here is that errors in design and implementation of hardware and software systems can, and often do, carry a human cost. Good design is not just an academic issue – it is also a human issue.
Based on the GAO report and other analyses we find that a number of flaws which contributed to this disaster. According to an article by Shelley Toich, in its original configuration the Patriot was designed as an anti-aircraft, not anti-missile defense system. As such it would be a mobile system (not stationary) and operating only for short periods of time. Though adjustments were made they were insufficient to allow it to operate correctly under battlefield conditions. The GAO report describes the Patriot missile system's operation as follows:
After the Patriot's radar detects an airborne object that has the characteristics of a
Scud, the range gate--an electronic detection device within the radar system—
calculates an area in the air space where the system should next look for it. The
range gate filters out information about airborne objects outside its calculated area
and only processes the information needed for tracking, targeting, and intercepting
Scuds. Finding an object within the calculated range gate area confirms that it is a
Scud missile. [http://www.fas.org/spp/starwars/gao/im92026.htm]
Although the incoming missile was initially detected, on the subsequent attempts at confirmation it was not found to be within the predicted location cone. Therefore the incoming missile was not tracked and the Patriot missile was not fired.
The reason the tracking mechanism failed was a combination of the introduction of a small round-off error and accumulated time of operation. The round-off error came when the number representing time as multiplied by 1/10, which cannot be expressed with complete accuracy in binary. And the battery had been operating continuously for over 100 hours. According to the GAO report, “The effect of this inaccuracy on the range gate's calculation is directly proportional to the target's velocity and the length of time the system has been running.” Mathematician Douglas N. Arnold calculates that the predicted position of the incoming missile was off by more than half a kilometer. [http://www.ima.umn.edu/~arnold/disasters/patriot.html]
For those who want to study this matter further, here is a set of links.
First, as so often is the case, the software component is encased within a system of hardware components which, in turn, is embedded within a much larger system of personnel and situational considerations. Because of this a lot of factors contributed to the final outcome. Second, as we dig deep for the root cause of the problem, we note the similarity to our analysis of the Ariane 5 rocket explosion. In both cases a pre-existing system was adapted to a new circumstance. With the Ariane 5 it seemed at first to be a fairly routine upgrade. But the new rocket design turned out to be so different as to require a deeper analysis of the entire system which was not carried out. In particular, a new round of testing should have been specified.
In the case of the Patriot missile disaster, although re-design was again one of the precipitating factors, it was much more than an upgrade. It could be described as a re-purposing. A system designed for one purpose was retooled for a very different purpose, but this retooling was based on an incomplete analysis of the circumstances of deployment. In fact, one could say that the missing ingredient here was interaction analysis and design. A description of typical scenarios and interactions would have revealed the necessity of analysis and testing in the context of long term, stationary deployment and the unrealistic nature of the instructions to reboot periodically.
Both the Ariane 5 and the Patriot missile incidents reveal that hidden dangers lurk in any re-design situation. There is a tendency to underestimate the scope of the changes that are being made and to skimp on aspects of the original design process, especially analysis, specification and testing. In addition, at the heart of the software design part of the problem we find the Y2K Syndrome – placing a shortcut into the design which operates well within the original framework, but fails miserably within the new context. For the Ariane rocket it was the floating point to integer conversion. For the Patriot it was permitting the continual accumulation of round-off error.
Fractions in Binary
This leads directly to a consideration of round-off error and decimal to binary conversion. Although we are used to base 10 fractional measurements – 1/10, 1/100, etc. (Olympic events are measured in 1/1000 of a second!) – there is a fundamental incommensurability between base 10 decimals and base 2 “binimals”. For example, although in base 10 1/10 = 0.1 in base 2 we have 1/10102 = 0.0001100110011…2. Therefore we need to be aware of where and when these differences between bases appear and what effect that may have on our programs. In Fractional Base Conversion we begin to get a handle on this topic.