How Challenger Exploded, and Other Mistakes Were Made
Torrents of big data can't stop mistakes—sometimes they make them more likely. The tragedy of Challenger, as with G.M. and others, was that some people had tried to stop it.
The crew of STS-51-L. Photo: NASA
Space Shuttle engineer Roger Boisjoly wasn't watching the launch on the morning of January 28, 1986. He was too nervous and angry from an argument he had had the day before with his managers and bosses. But he was now on the phone with his colleagues in Florida, listening anxiously as the countdown sequence began.
A few hours earlier, before the mission's seven astronauts climbed into the Shuttle cabin, the thermometer reading at the launchpad dipped to an unusual low for Cape Canaveral: 31 degrees Fahrenheit.
At some point, in the cold, one of the rubber rings that sealed a joint on the right solid rocket booster had grown brittle and useless.
Within seconds after lift-off, at 11:38 AM, aluminum oxides from the burned solid propellant temporarily sealed the damaged joint. But that temporary seal would only last for about 60 seconds. Later analysis revealed that hot gas began to leak out of the hole on the rocket booster.
A few seconds later, around 68 seconds after launch, Mission Control issued its usual commands to the Shuttle that it was "go at throttle up"; Commander Dick Scobee responded: "Roger, go at throttle up." Four seconds later, according to a cockpit voice recorder, the Shuttle's pilot, Michael J. Smith, said "Uh oh."
He may have been responding to the feeling of the left rocket booster suddenly accelerating sideways, or to cockpit indications of main engine performance, or to falling pressures in the external fuel tank. At 73 seconds, the aft dome of the liquid hydrogen tank, inside the giant orange external tank, failed. This produced a propulsive force that rammed the hydrogen tank into the liquid oxygen tank in the forward part of the tank. Simultaneously, the right rocket booster came unattached and slammed into the tank. A complete structural failure ensued, and the LH2 and LHO ignited, creating a fireball that enveloped the whole stack.
Inside the intact crew compartment, the astronauts continued on an upper ballistic trajectory. At +75 seconds, the orbiter could be seen shooting out of the plume of gas and fire. From the break-up at 48,000 feet, or 15 kilometers up, the crew compartment peaked at an altitude of 65,000 feet, or 20 km, approximately 25 seconds after breakup. It struck the ocean surface about two minutes and forty-five seconds after breakup at a velocity of about 207 miles per hour, an impact with forces that approximated 200 G's.
The disaster would paralyze the space program, and set off years of investigations and studies and recriminations. But at that devastating, crushing moment, Boisjoly said later, "we all knew exactly what happened."
I did the smartest thing I ever did in my lifetime. I refused to sign [that document]. I just felt it was too much risk to take.
On a conference call the night before, Boisjoly and his colleagues at NASA contractor Morton Thiokol described the risk of low temperatures to NASA managers from their headquarters in Utah, and urged NASA to postpone the launch.
"It isn't what they wanted to hear," Allan McDonald, one of the Thiokol engineers on the call, told the producers of "Major Malfunction," a short documentary produced by Retro Report and the New York Times (it's embedded below).
"My God, Thiokol, when do you want me to launch — next April?" Larry Mulloy, a NASA manager, shot back.
All eyes were on the Shuttle. NASA was five years and twenty-four missions into the program, which in spite of rising costs and complexities, was meant to make space travel more routine with the help of a reusable spacecraft. The launch of Challenger, carrying the first civilian astronaut, "teacher in space" Christa McAuliffe, would be broadcast to thousands of schools across the country.
But inside NASA, problems with the Shuttle had quietly piled up. The presidential commission's report on the accident later found that as early as 1977, NASA managers had known that the O-rings performed poorly at low temperatures, and that they wouldn't properly form a seal in the cold. In earlier launches, the engineers found that seals had been damaged, though not enough to cause catastrophe. Rather than redesigning the part, however, managers at NASA and Thiokol had filed the problem away as "an acceptable flight risk."
But the temperatures predicted for Challenger's launch raised new concerns among the engineers. The O-rings weren't tested for safety below temperatures of 53 degrees, they told NASA, and on the morning of January 28, temperatures were expected to fall a degree below freezing, the minimum temperature permitted for launch. They couldn't recommend a launch.
But NASA officials weren't impressed by their concerns. George Hardy, a NASA deputy director, said he was "appalled" by Thiokol's recommendation, Boisjoly remembered. He described the conference call to the Rogers Commission: "This was a meeting where the determination was to launch, and it was up to us to prove beyond a shadow of a doubt that it was not safe to do so," he said. "This is in total reverse to what the position usually is in a preflight conversation or a flight readiness review. It is usually exactly opposite that."
In the midst of the dispute, executives at Morton Thiokol asked for a five minute break to discuss the matter in private. That turned into thirty minutes. It was 10:30 pm. Under pressure to launch and without having fully communicated their reservations to NASA, the Thiokol managers voted to reverse the scientists' recommendation. Challenger was a go.
"Final agreement," according to NASA's internal report, "is: (1) there is a substantial margin to erode the primary O-ring by a factor of three times the previous worst case, and (2) even if the primary O-ring does not seal, the secondary is in position and will." NASA asks that Thiokol put their recommendation in writing and send it by fax to NASA.
But in Florida, McDonald, who was directly in charge of the solid rocket booster program, refused to carry out his usual task: signing the "launch rationale" document that functioned like an engineer's green light. "McDonald is told it is not his concern and that his above concerns will be passed on in advisory capacity," according to the Rogers report.
"I did the smartest thing I ever did in my lifetime," McDonald tells Retro Report. "I refused to sign it. I just felt it was too much risk to take." His boss, Joe Kilminster, signed the document instead.
See something, say something (and do something)
Some have argued that while the risks surrounding Challenger's launch were known by certain people, they weren't communicated clearly enough and hence they were ignored. NASA managers might have understood the problem in numbers, but didn't comprehend the danger the problem posed to the lives of astronauts. It's hard to fear what you can't see. A simple science experiment might have helped: The rubber of the O-rings easily deformed when exposed to sub-zero temperatures, as Richard Feynman dramatically demonstrated during the Rogers Commission hearings.
But prior to launch, the engineers couldn't provide a statistical analysis to back up their concerns. "I was asked to quantify my concerns," O-ring expert Roger Boisjoly told a NASA historian, "and I said I couldn't, I couldn't quantify it, I had not data to quantify it, but I did say I knew that it was away from goodness in the current data base."
There was another problem, the data designer Edward Tufte would later argue: the engineers' arguments about the O-ring problem were poorly designed. Building off observations made by NASA investigators, he contended that the display of data in their charts insufficiently described the risk, as in this crucial table the engineers drew for NASA managers the day before the launch, with predicted figures for Challenger at the bottom.
Tufte also points to another chart with historical data that the engineers showed to NASA prior to launch.
The design of that simple chart was fatally uncommunicative, Tufte says. "This display contains all the information necessary to diagnose the relationship between temperature and damage, if only we could only see it," he wrote. "In the 13 charts prepared for making the decision to launch," he concludes, "there is a scandalous discrepancy between the intellectual tasks at hand and the images created to serve those tasks."
By comparison, the Rogers Commission's visualization of the data, drawn later that year, illustrates the uncertainty surrounding the O-rings more vividly than the engineer's table does. The outlier temperature in the upper left corner, STS 51-C, was measured at the launch of Space Shuttle Discovery a year prior, which at the time was the coldest recorded during a shuttle launch, at only 53 °F. The Challenger would launch in far lower temperatures.
In Visual Explanations, Tufte expands upon this graph to make the risk even more vivid, adding a marker to indicate how cold the Challenger's O-rings were that morning:
To Tufte, Challenger was a reminder that a problem can't be addressed if it can't be seen. Of course, this is all assuming that when a problem can be seen, people are willing to see it.
There is a tendency in institutions of all kinds to avoid focusing on difficult problems until they explode into crisis and disaster. In a 2002 paper in Science and Engineering Ethics, the ethicist Wade Robison and Roger Boisjoly, the former Shuttle engineer, criticized Tufte's analysis for lacking rigor and glossing over aspects of the real-life data. But they also argued that Tufte placed too much empahsis and blame on the engineers and their poorly-designed chart. Tufte's criticism, they argued, failed to take into account that many of the same engineers had raised acute concerns about the O-rings in the months before Challenger launched. But their concerns were largely shrugged off.
After reviewing Boisjoly's notes, Major Gen. Donald Kutyna, a member of the Rogers Commission, later compared NASA's acceptance of risk in the O-rings to an airline allowing one of its planes to keep flying despite evidence that one of its wings was about to fall off. At NASA, gathering data about risks, connecting the dots, and responding accordingly had quietly been subordinated to a goal as relatively insignificant as it was politically expedient: launching the Shuttle on time, and often, at a time when NASA's managers were struggling to prove the program's worth, not only to the American public but to the starry-eyed hawks in Ronald Reagan's White House and the Pentagon. Some, like Neil deGrasse Tyson, have pointed out that the Shuttle was more of a promotional and propaganda and military tool than a platform for science.
Their mistake wasn't as active as negligence or recklessness; the agency had become desensitized to a problem because the problem hadn't proved to be hazardous enough to provoke serious concern, and because there was no easy solution to it. The standard at NASA had slowly deviated, so that a second set of O-Rings on the booster rockets, meant to provide redundancy in extreme cases, became part of standard use. A better solution wasn't impossible. One would be developed after Challenger, but by then the cost of the upgrade would be far more costly and more ghastly than NASA's managers could have imagined.
GM: Group thought and speaking up
Blindness to risk—and to a gradual sliding of standards—can be collective too, and that can make it even more pernicious. After Challenger, critics accused NASA of "go fever"—as in "go for launch"—or "groupthink." The term for this mindset originated in 1972 with Irving L. Janis, a Yale psychologist and a pioneer in the study of social dynamics. Janis sought to describe how, when groups are making difficult decisions, concurrence and authority can supersede rationality and intelligence. In group settings, he argued, it becomes easy to fall back upon overconfidence, tunnel vision, and conformity. Groupthink is "a mode of thinking that people engage in when they are deeply involved in a cohesive in-group," he wrote, "when the members' strivings for unanimity override their motivation to realistically appraise alternative courses of action."
As problems build up without crises, a lulling, false sense of security builds up too. And at critical moments, when institutions face tough questions, experts who sit lower down on the chain of command might be reluctant to hold up the decision-making process. Meanwhile, the longer the chain of command, the easier it becomes to defray personal responsibility.
"As you go up the chain, you're generally asked harder and harder questions by people who have more and more control over your future,'' David Lochbaum, a nuclear engineer at the Union of Concerned Scientists told the Times in 2003. Though Boisjoly and others stated that their bosses were responsible for recommending to launch, no one would be blamed specifically by the commission that investigated Challenger. Instead, the Commission pointed to a systemic problem, criticizing NASA's "flawed" process of decision-making. As the old Washington phrase goes, mistakes were made.
A systemic inability to read risks properly lives in potentially every institution, but it becomes especially clear in those whose decisions can kill people. Announcing her report on an internal investigation earlier this month, GM's CEO Mary Barra described a company where individuals failed to act on information that indicated danger. Officials at the company knew about safety issues with the ignition switch in their cars as early as 2010, but failed to act, she said. Others within the company did too.
"Numerous individuals did not accept any responsibility to drive our organization to understand what was truly happening," she said. "The report highlights a company that operated in silos, with a number of individuals seemingly looking for reasons not to act, instead of finding ways to protect our customers."
As with the O-rings at NASA, the problem of cars stalling when moving at high speed was over the years filed away as an "acceptable risk," an issue of convenience rather than safety, and lost amidst the company's various silos. As at NASA, GM emphasized safety as critical, but safety concerns competed with cost control, an issue that one engineer said "permeates the fabric of the whole culture." Staff reductions, meanwhile, put more pressure on engineers. The net result, the report says, was a culture that discouraged stepping up, speaking out, admitting fault, and making redesigns.
As a result, Barra said, she wants employees who aren't having their concerns addressed to email her directly. "If you are aware of a potential problem affecting safety or quality and you don't speak up, you're part of the problem," Barra said. "And that is not acceptable. If you see a problem you don't believe is being handled correctly, elevate it to your supervisor. If you still don't believe it's being handled correctly, contact me directly."
But for potential whistleblowers at GM, even internal ones, experience holds an opposite lesson. Some at GM did speak up only to be shunned. While GM's investigation into itself found no evidence of a cover-up, a recent Businessweek investigation detailed the repeated, failed attempts of one internal whistleblower to fix the problem. Even when organizations know about risks, some will go out of their way to hide them and fight the people who expose them.
The Dept. of Veterans Affairs is another recent example. Evidence has piled up suggesting that various doctors and employees were dismissed apparently for complaining about disorganization, persistent delays, and attempts to conceal both at hospitals around the country. The main method of concealment was known as "phantom appointments," in order to suggest that patients were being seen on time, to meet federal requirements. Kathy Leatherwood, a nurse at the Alaska V.A., told the Times that she was instructed to mark the patient as a "no show" or a cancelation and schedule a real appointment for later.
When Leatherwood went to one administrator, her response echoed that of Allan McDonald on the day before Challenger's launch: "It's my name that's going to be on that chart," she remembered telling the administrator. If she was unwilling to carry out the policy, "he would find someone who would, she said. When she continued objecting, he threatened to call security if she did not leave his office."
After Eric K. Shinseki left his post last month as head of the department, the interim head, Sloan Gibson, pointed to a culture of silence and intimidation at the V.A. "I understand that we've got a cultural issue there, and we're going to deal with that cultural issue," he said.
We are never ever going to say that there is nothing we can do.
Prior to Challenger's launch, Thiokol engineers had gone through official NASA channels to air complaints and were ignored; after they took their protest public in front of the Rogers commission, they were ostracized. In Truth, Lies and O-Rings, Allen MacDonald wrote that "Roger and I already felt like lepers, but when we returned to Utah following the [Rogers commission interview] our colleagues treated us as if we had just been arrested for child sexual abuse." Boisjoly was shunned by colleagues, taken off "space work" by his employer. "Managers isolated him in his position and 'made life a living hell on a day-to-day basis.'"
"When I realized what was happening, it absolutely destroyed me," Boisjoly told the AP in a 1988 telephone interview. "It destroyed my career, my life, everything else." Boisjoly would keep working at Thiokol for six months before taking a long-term disability leave, having been diagnosed with post-traumatic stress disorder. He later filed lawsuits against NASA and Thiokol but they never made it to the discovery process, and he came to view them as "an exercise in total futility." He passed away in 2012.
At NASA, "cultural issues" lingered even after Challenger. On February 1, 2003, Columbia burned up during its descent to Earth above the Western U.S. after gases entered the left wing through a hole that had formed during launch. The culprit: a piece of foam that fell off the external fuel tank, and which NASA managers had discounted.
By 2003, the Shuttle program, rolling along safety and mostly quietly, had nearly become a national afterthought. Inside NASA, the risk posed by a piece of foam was also an afterthought, a threat that, if not preposterous, was not worth considering. When one engineer proposed asking the Pentagon to inspect possible damage on the underside of the Shuttle using one of its spy satellites, the request was nixed by his higher-up.
Governing NASA's decision-making at the time was the awkward and macabre sense that if there were a problem with the Shuttle's wing, what could have possibly been done to fix it once it was in orbit? But in fact, just as a solution was later found for the O-ring problem, a later thought experiment imagined how NASA might have arranged for an emergency rescue or scrambled a space walk to make repairs. But without honestly acknowledging the risk, no solution was floated. Wayne Hale, a NASA manager, said later, "We are never ever going to say that there is nothing we can do."
Once again, as with Challenger, basic physics held the answers. But it wouldn't be until a high-school-science-lab kind of demonstration conducted a few months later that NASA managers saw their mistake: when fired at a mock up of the Shuttle's wing, even a small piece of foam blew a hole straight through it, leaving them aghast. If the risk might have been communicated better, sooner, a conversation like this, held by NASA managers while the Shuttle was in orbit, might not have ended so easily and quickly, with the conclusion that the only inconvenience might be minor repairs after the Shuttle had landed.
MR. McCORMACK -- Well it could be down to the, we could lose an entire tile, I mean, and then the ramp into and out of that. It could be a significant area of tile damage down to the S.I.P. [strain isolation panel]. Perhaps it could be a significant piece missing but----
MS. HAM -- Would be a turnaround issue only?
MR. McCORMACK -- Right.
MS. HAM -- Right, O.K., same thing that you told me about the other day in my office, we've seen pieces of this size before, haven't we?
MR. LEINBACH -- Hey, Linda, we are missing part of that conversation.
MS. HAM -- Right . . . He was just reiterating, it was Calvin [Schomburg], that he does not believe that there is any uh burnthroughs so no safety of flight kind of issue, it's more of a turn around issue similar to what we have had on other flights. That's it? All right, any questions on that? O.K. . . .
Even in organizations where autonomy is considered paramount, stronger forces are at play. When a problem is ignored long enough, it can go from being an acceptable risk to a disaster in an instant. And if and when someone within the group sounds an alarm, that person is going up against organizational inertia, chains of command, and cultures that derive at least some of their strength from a sense of obedience.
We all make mistakes. It's very likely I've made a few in this article, which is why I've relied on others to let me know if I've made any and to offer suggestions. Paradoxically, however, this system can lead to complacency and a false sense of security: if I rely too much on others to check for my own mistakes, I might atrophy my own ability to recognize them. If an editor sees a problem but doesn't mention it, perhaps thinking that's it not important enough to mention or that I have already seen it, or if he or she is simply overwhelmed by a host of other little concerns, it's easy to see how an error can slip through a system meant to prevent them. This is only an article, not a Space Shuttle, but if it were, it's also easy to see how a tiny error can lead to catastrophe.
I feel as if I'm watching as we fly in slow motion on a collision course toward a giant mountain. We can see the crash coming, and yet we're sitting on our hands rather than altering course.
The trick is knowing which errors must be addressed and which can be accepted, and which are being accepted simply because we fail to see how dangerous they are. Hank Paulson, who presided over the 2008 financial crisis as Secretary of the Treasury, laments in a recent op-ed that "we're making the same mistake today with climate change" as we did with the financial markets: building up excesses without providing powerful solutions.
"The warning signs are clear and growing more urgent as the risks go unchecked," he wrote. "This is a crisis we can't afford to ignore. I feel as if I'm watching as we fly in slow motion on a collision course toward a giant mountain. We can see the crash coming, and yet we're sitting on our hands rather than altering course."
Like most crises, the financial calamity of 2008 resulted in some major corrections. But it also demonstrated a paradox of large complex systems, the kind that increasingly determine our daily lives. When risks balloon into crisis, these large systems can become too big to manage. But if we're unable to imagine that they could fail to begin with, simply because they're too important to society—think of the banks bailed out during Paulson's tenure—then we might overlook problems. "Too big to fail," in a sense, makes failure even harder to avoid. (Despite the protestations of Alan Greenspan and others that "if they're too big to fail, they're too big," a survey by the International Monetary Fund this year warned that the problem still exists.)
Among other flaws, the financial crisis exposed some of the new methods companies use to shield themselves from risk, to reduce the moral hazard involved. One of the interesting aspects of the so-called "sharing economy" umbrella is the way that some of its companies engineer not just new software but new methods for avoiding the liabilities that companies used to carry. Insurance companies flourish.
Despite its shortcomings, NASA's Space Shuttle left a positive legacy for spaceflight and for everyone. Part of that was the lesson that not making mistakes requires the brute force of computers, torrents of data, and an understanding of laws, both the physics and the government kind. It demands focus. But it also needs human doubt and dissent.
Of course, speaking up or speaking out carries its own risks: as the experiences of Roger Boisjoly and Allan McDonald and others showed, speaking up can be a mammoth and expensive venture, and you could end up looking like a Chicken Little, branded as disobedient, or become the target of a government investigation.
Or, worst of all perhaps, you could simply be ignored.