A Floating Point Error That Caused A Damage Worth Half A Billion

If you ever did a little bit of programming, you must be aware of the term: floating point. One of the most neglected and potentially dangerous errors one encounters is the floating point error.

I bet a programmer must have seen the floating point error at least once in his/her life. But how much damage a floating point error can do? Ask that to European Space Agency that lost an effort of over a decade and $500 millions, all thanks to a floating point bug.

The story of Ariane 5:

On 4 June 1996, the maiden flight of the Ariane 5 launcher ended in a failure. Only about 40 seconds after initiation of the flight sequence, at an altitude of about 3700 m, the launcher veered off its flight path, broke up and exploded.

The failure of the Ariane 501 was caused by the complete loss of guidance and attitude information 37 seconds after start of the main engine ignition sequence (30 seconds after lift-off). This loss of information was due to specification and design errors in the software of the inertial reference system.

The internal SRI* software exception was caused during execution of a data conversion from 64-bit floating point to 16-bit signed integer value. The floating point number which was converted had a value greater than what could be represented by a 16-bit signed integer.

So, what exactly happened?

A 64-bit floating point number relating to the horizontal velocity of the rocket with respect to the platform was converted to a 16 bit signed integer. The number was larger than 32,767, the largest integer storable in a 16 bit signed integer, and thus the conversion failed.

The software ended up triggering a system diagnostic that dumped its debugging data into an area of memory being used by the programs guiding the rocket’s motors. At the same time, control was switched to a backup computer that unfortunately had the same data.

This was misinterpreted as necessitating strong corrective action and the rocket’s motors swiveled to the limits of their mountings. Disaster ensued.

The coding was done in Ada. The last line is that caused the tragedy:

L_M_BV_32 := TBD.T_ENTIER_32S ((1.0/C_M_LSB_BV) * G_M_INFO_DERIVE(T_ALG.E_BV));

if L_M_BV_32 > 32767 then
    P_M_DERIVE(T_ALG.E_BV) := 16#7FFF#;
elsif L_M_BV_32 < -32768 then
    P_M_DERIVE(T_ALG.E_BV) := 16#8000#;
else
    P_M_DERIVE(T_ALG.E_BV) := UC_16S_EN_16NS(TDB.T_ENTIER_16S(L_M_BV_32));
end if;

P_M_DERIVE(T_ALG.E_BH) := 
  UC_16S_EN_16NS (TDB.T_ENTIER_16S ((1.0/C_M_LSB_BH) * G_M_INFO_DERIVE(T_ALG.E_BH)));