There is a saying commonly misattributed to Gene Kranz the Apollo 13 flight director: failure is not an option. In a way that’s true. Failure isn’t an option. I would say it’s inevitable in any complicated system. Most of us work in one organisation or another. All of us rely on various organisations in our day to day lives. I work in the National Health Service, one of 1.5 million people. A complex system doing complex work.
In a recent musing I looked at how poor communication through PowerPoint had helped destroy the space shuttle Columbia in 2003. That, of course, was the second shuttle disaster. In this musing I’m going to look at the first.
This is the story of how NASA was arrogant; of unrealistic targets, of disconnect between seniors and those on the shop floor and of the misuse of statistics. It’s a story of the science of failure and how failure is inevitable. This is the story of the Challenger disaster.
”An accident rooted in history”
It’s January 28th 1986 at Cape Canaveral, Florida. 73 seconds after launching the space shuttle Challenger explodes. All seven of its crew are lost. Over the tannoy a distraught audience hears the words, “obviously a major malfunction.” After the horror come the questions.
The Rogers Commission is formed to investigate the disaster. Amongst its members are astronaut Sally Ride, Air Force General Donald Kutyna, Neil Armstrong, the first man on the moon, and Professor Richard Feynman; legendary quantum physicist, bongo enthusiast and educator.
The shuttle programme was designed to be as reusable as possible. Not only was the orbiter itself reused (this was Challenger’s tenth mission) but the two solid rocket boosters (SRBs) were also retrieved and re-serviced for each launch. The cause of the Challenger disaster was found to be a flaw in the right SRB. The SRBs were not one long section but rather several which connected with two rubber O-rings (a primary and a secondary) sealing the join. The commission discovered longstanding concerns regarding the O-rings.
In January 1985 following a launch with the shuttle Discovery soot was found between the O-rings indicating that the primary ring hadn’t maintained a seal. At that time the launch had been the coldest yet at about 12 degrees Celsius. At that temperature the rubber contracted and became brittle making it harder to maintain a seal. On other missions the primary ring was nearly completely eroded through. The flawed O-ring design had been known about since 1977 leading the commission to describe Challenger, “an accident rooted in history.”
The forecast for the launch of Challenger would break the cold temperature record of Discovery: -1 degrees Celsius. On the eve of the launch engineers from Morton Thiokol alerted NASA managers of the danger of O-ring failure. They advised waiting for a warmer launch day. NASA however pushed back and asked for proof of failure rather than proof of safety. An impossibility.
“My God Thiokol, when do you want me to launch? Next April?”
Lawrence Molloy, SRB Manager at NASA
NASA pressed Morton Thiokol managers to go over their engineers and approve launch. On the morning of the 28th the forecast was proved right and the launch site was covered with ice. Reviewing launch footage the Rogers Commission found that in the cold temperature O-rings on the right SRB had failed to maintain a seal. 0.678 seconds into the launch grey smoke was seen escaping the right SRB. Due to ignition the SRB casing expanded slightly and the rings should have moved with the casing to maintain the seal. However, at minus one degrees Celsius they were too brittle and failed to do so. This should have caused Challenger to explode on the launch pad but aluminium oxides from the rocket fuel filled the damaged joint and did the job of the O-rings by sealing the site. This temporary seal allowed the Challenger to lift off.
This piece of good fortune might have allowed Challenger and its crew to survive. Sadly, 58.788 seconds into the launch Challenger hit a strong wind sheer which dislodged the aluminium oxide. This allowed hot air to escape and ignite. The right SRB burned through its joint to the external tank, coming loose and colliding with it. This caused a fireball which ignited the whole stack.
Challenger disintegrated and the crew cabin was sent into free fall before crashing into the sea. When the cabin was retrieved from the sea bed the personal safety equipment of three of the crew had been activated suggesting they survived the explosion but not the crash into the sea. The horrible truth is that it is possible they were conscious for at least a part of the free fall. Two minutes and forty five seconds.
So why the push back from NASA? Why did they proceed when there were concerns about the safety of the O-rings? This is where we have to look at NASA as an organisation arrogantly assumed it could guarantee safety. This included its own unrealistic targets.
NASA’s unrealistic targets
NASA had been through decades of boom and bust. The sixties had begun with them lagging behind the Soviets in the space race and finished with the stars and stripes planted on the moon. Yet the political enthusiasm triggered by President Kennedy and the Apollo missions had dried up and with it the public’s enthusiasm also waned. The economic troubles of the seventies were now followed by the fiscal conservatism of President Reagan. The money had dried up. NASA managers looked to shape the space programme in a way to fit the new economic order.
First, space shuttles would be reusable. Second, NASA made bold promises to the government. Their space shuttles would be so reliable and easy to use there would be no need to spend money on any military space programme; instead give the money to NASA to launch spy satellites. In between any government mission the shuttles would be a source of income as the private sector paid to use them. In short, the shuttle would be a dependable bus service to space. NASA promised that they could complete sixty missions a year with two shuttles at any one time ready to launch. This promise meant the pressure was immediately on to perform.
Four shuttles were initially built: Atlantis, Challenger, Columbia and Discovery. The first shuttle to launch was Columbia on 12th April 1981, one of two missions that year. In 1985 nine shuttle missions were completed. This was a peak that NASA would never exceed. By 1986 the target of sixty flights a year was becoming a monkey on the back of NASA. STS-51-L’s launch date had been pushed back five times due to bad weather and the previous mission itself being delayed seven times. Delays in that previous mission were even more embarrassing as Congressman Bill Nelson was part of the crew. Expectation was mounting and not just from the government.
Partly in order to inspire public interest in the shuttle programme the ‘Teacher in Space Project’ had been created in 1984 to carry teachers into space as civilian members of future shuttle crews. From 11,000 completed applications one teacher, Christa McAuliffe from New Hampshire was chosen to fly on Challenger as the first civilian in space. She would deliver two fifteen minute lessons from space to be watched by school children in their classrooms. The project worked. There was widespread interest in the mission with the ‘first teacher in space’ becoming something of a celebrity. It also created more pressure. McAuliffe was due to deliver her lessons on Day 4 of the mission. Launching on 28th January meant Day 4 would be a Friday. Any further delays and Day 4 would fall on the weekend; there wouldn’t be any children in school to watch her lessons. Fatefully, the interest also meant 17% of Americans would watch Challenger’s launch on television.
NASA were never able to get anywhere close to their target of sixty missions a year. They were caught out by the amount of refurbishment needed after each shuttle flight to get the orbiter and solid rocket boosters ready to be used again. They were hamstrung immediately from conception by an unrealistic target they never should have made. Their move to inspire public interest arguably increased demand to perform. But they had more problems including a disconnect between senior staff and those on the ground floor.
During the Rogers Commission NASA managers quoted that the risk of a catastrophic accident (one that would cause loss of craft and life) befalling their shuttles was 1 in 100,000. Feynman found this figure ludicrous. A risk of 1 in 100,000 meant that NASA could expect to launch a shuttle every day for 274 years before they had a catastrophic accident. The figure of 1 in 100,000 was found to have been calculated as a necessity; it had to be that high. It had been used to reassure both the government and astronauts. It had also helped encourage a civilian to agree to be part of the mission. Once that figure was agreed NASA managers had worked backwards to make sure that the safety figures for all the shuttle components combined to make an overall risk of 1 in 100,000. NASA engineers knew this to be the case and formed their own opinion of risk. Feynman spoke to them directly. They perceived the risk at somewhere between 1 in 50 and 1 in 200. Assuming NASA managed to launch sixty missions a year that meant their engineers expected a catastrophic accident somewhere between once a year to once every three years. As it turned out the Challenger disaster would occur on the 25th shuttle mission. There was a clear disengagement between the perceptions of managers and those with hands on experience regarding the shuttle programme’s safety. But there were also fundamental errors when it came to calculating how safe the shuttle programme was.
One of those safety figures NASA included in their 1 in 100,000 figure involved the O rings responsible for the disaster. NASA had given the O rings a safety factor of 3. This was based on test results which showed that the O rings could maintain a seal despite being burnt a third of the way through. Feynman again tore this argument apart. A safety factor of 3 actually means that something can withstand conditions three times those its actually designed for. He used the analogy of a bridge built to only hold 1000 pounds being able to hold a 3000 pound load as showing a safety factor of 3. If a 1000 pound truck drove over the bridge and it cracked a third of a way through then the bridge would be defective, even if it managed to still hold the truck. The O rings shouldn’t have burnt through at all. Regardless of them still maintaining a seal the test results actually showed that they were defective. Therefore the safety factor for the O rings was not 3. It was zero. NASA misused the definitions and values of statistics to ‘sell’ the space shuttle as safer that it was. There was an assumption of total control. No American astronaut had ever been killed on a mission. Even when a mission went wrong like Apollo 13 the astronauts were brought home safely. NASA were drunk on their reputation.
The Rogers Commission Report was published on 9th June 1986. Feynman was concerned that the report was too lenient to NASA and so insisted his own thoughts were published as Appendix F. The investigation into Challenger would be his final adventure; he was terminally ill with cancer during the hearing and died in 1988. Sally Ride would also be part of the team investigating the Columbia disaster; the only person to do so. After she died in 2012 Kutyna revealed she had been the person discretely pointing the commission in the correct direction of the faulty O-rings. The shuttle programme underwent a major redesign and it would be two years before there was another mission.
Sadly, the investigation following the Columbia disaster found that NASA had failed to learn lessons from Challenger with similar organisational dysfunction. The programme was retired in 2011 after 30 years and 133 successful missions and 2 tragedies. Since then NASA has been using the Russian Soyuz rocket programme to get to space.
The science of failure
Failure isn’t an option. It’s inevitable. By its nature the shuttle programme was always experimental at best. It was wrong to pretend otherwise. Feynman would later compare NASA’s attitude to safety to a child believing that running across the road is safe because they didn’t get run over. In a system of over two million parts to have complete control is a fallacy.
We may not all work in spaceflight but Challenger and then Columbia offer stark lessons in human factors we should all learn from. A system may seem perfect because its imperfection is yet to be found, or has been ignored or misunderstood.
The key lesson is this: We may think our systems are safe, but how will we really know?
"For a successful technology, reality must take precedence over public relations,
for Nature cannot be fooled."
Professor Richard Feynman