Based on a presentation given at ALPA Air Safety Week, 19 August 2004
In this study we reviewed the 19 major U.S. airline accidents between 1990 and 2000 in which the NTSB found crew errors to be among the probable causes. We used the NTSB reports as the raw material for our analysis. We worked our way step by step through the events of the accident flight, and at each step we asked this question: Why might any highly experienced airline crew in the situation of the accident crew and knowing only what the accident crew knew at that moment have responded in much the way the accident crew responded at that step of the flight?
Asking the question this way changes how we think about pilot error and how we think about the causes of accidents. When investigating and analyzing aviation accidents it is very easy to fall into what cognitive scientists call hindsight bias. Knowing the outcome of an accident, i is easy to identify things the crew could have done differently that would have averted the accident. But the accident crew does not know the outcome in advance; they are responding to the situation as they perceive it at the moment.
Crew Responses to Situations
Obviously, the immediate demands of the situation and the tasks being performed and the training and experience of the crew members. Social and organization influences include not just formal procedures, policies, and goals, but also implicit goals, and sometimes these are in conflict. The actual norms for operating on the line, which may or may not always be the same as what is in the Flight Operations Manual.
At the heart of all this are the characteristics and limitations of human cognitive processes. All of these factors interact and our analysis emphasizes the interaction of these factors.
A Truism about Accidents
One thing is immediately obvious about almost all human factors accidents we have looked at over the years: No one thing “caused” the accident; rather accidents typically occur from the confluence of multiple events, actions taken or not taken by individuals, and the environment in which those individuals operate. Although this truism is recognized by the industry we need to look more deeply at the implications of this fact.
Confluence of Factors in a CFIT Accident
This confluence is illustrated in this slide of factors in a CFIT accident in which a MD-83 struck trees 310 feet below MDA on a non-precision approach in 1995. I wonder if all pilots are even aware that non-precision approaches in the U.S. can give as little as 250 feet of terrain clearance? Weather played a central role in this accident in several ways: barometric pressure was changing rapidly, there was a strong crosswind, the tower window broke, causing the tower to close. The approach controller failed to update the altimeter setting before switching frequency and the crew did not request an update. At the time the airline used QFE altimetry. Workload was high, in part because of the crosswind and the PF’s use of Heading Select. The crew slightly mis-set the altimeters, causing a 70 foot error; this plus the rapid change in barometric pressure caused the aircraft to fly 170 feet lower than the altimeters indicated. The pilot flying (PF) used Altitude Hold to capture MDA, however in turbulence Altitude Hold can allow the aircraft to sag as much as 130 feet below the target altitude. The pilot monitoring (PM) called out that they were passing through MDA, using somewhat non-standard phraseology and the aircraft struck trees 310 feet below minimum descent altitude (MDA). Fortunately, although the aircraft was heavily damaged, no one was severely injured. My point in going through this slide is not to talk about the details of this specific accident. None of the factors in this accident were all that extraordinary; it was the particular way that all of these factors happened to combine that produced the accident. It’s popular these days to talk about the “accident chain”, but clearly this is not a single chain, it is a confluence of many factors operating in parallel and converging. The occurrence and combination of all these factors in a single flight is to large degree a matter of chance, and there is the great challenge.
Chance Combination of Contributing Factors
In our modern airspace system accidents are very rare events because the industry has developed ways of preventing single-point failures in equipment or human performance from causing accidents. The few accidents that slip through all the defenses occur because of the largely random combination of many factors. This random aspect makes it very difficult to develop countermeasures to prevent future accidents because the number of combinations and permutations of factors that might occur in a flight are virtually infinite. And it is very scary—none of us like to feel we are operating in a system that we cannot completely control. Until we deal with this reality we will not make much progress in improving aviation safety. It is commonplace to propose that we teach pilots to avoid accidents by “breaking the accident chain” but it is not at all clear how pilots can recognize in the midst of a demanding flight the many different ways multiple factors can combine by chance to lead to an accident.
6 Overlapping Clusters of Error Patterns
In our study we took the crew errors identified by the NTSB at face value—in hindsight it is clear some of the actions the crews took were not the right things to do. However by the end of this talk I hope you will be asking questions about how we should think about these errors and what they represent. We found the errors in the 19 accidents to cluster roughly into six groups, with some overlap among the groups. These clusters are defined as much by the situations confronting the pilots as by the form of error.
Inadvertent Slips/Oversights in Practiced Tasks Under Normal Conditions
The first group consists of inadvertent slips and oversights while performing highly practiced tasks under normal conditions. Examples are overlooking a checklist item, remembering an altimeter setting incorrectly, and slightly misjudging the landing flare. These errors are very much the same as the errors pilots themselves report making to ASRS and ASAP programs. From a cognitive perspective occasional errors made by experts are an inevitable consequence of the way the brain is wired, the incomplete information available at the time to the expert, and the competing demands of tasks. No pilot is immune to these errors, so in many cases the difference between an accident flight and a routine non-accident flight is not the presence of errors but the happenstance combination of the errors with several other factors.
Inadvertent Slips/Oversights in Practiced Tasks Under Challenging Conditions
The probability of making these inadvertent slips and oversights goes up with workload, time pressure, fatigue, and stress. In some accidents we noted a snowball effect in which decisions or actions at one stage of the flight increased the crew’s vulnerability to making errors later. For example, a crew that continued a highly questionable approach in the vicinity of a thunderstorm put themselves in a high workload situation that may have contributed to their forgetting to arm the spoilers.
Inadequate Execution of Non-Normal Procedures Under Challenging Conditions
Four of these 19 accidents involved inadequate execution of non-normal procedures under challenging conditions. These included imperfect execution of procedures for recovering from a spiral dive, from a stall, and from windshear. These days pilots are trained to recover from these situations, but a recent study by Veridian Corporation revealed that even with upset training many pilots have trouble executing recovery procedures adequately. We suspect that one shortcoming of existing upset recovery training is that in the training pilots are expecting the upset and typically they know which upset they are about to encounter. But in the real world of surprise, confusion, and stress pilots may have trouble identifying the nature of the upset and selecting the correct recovery procedure.
Inadequate Response to Rare Situations for which Pilots are not Trained
The fourth cluster involved inadequate response to rare situations for which pilots are not trained. These situations included a false stickshaker activation just after rotation, an oversensitive autopilot that drove the aircraft toward the ground near Decision Height, anomalous indications from airspeed indicators that did not become apparent until the aircraft was past rotation speed, and an uncommanded autothrottle disconnect whose annunciation was not at all salient. Here too surprise, confusion, stress, and time pressure undoubtedly play a role. No data exist on what percentage of airline pilots would respond quickly and correctly to these situations, but we suspect that performance is unlikely to be reliable under these conditions.
Judgement and Decision-Making in Ambiguous Situations
The fifth cluster of errors involved judgment in ambiguous situations that hindsight proves wrong. An example of judgment in ambiguous situations is continuing an approach toward an airport in the vicinity of thunderstorms. No algorithm exists for crews to calculate exactly how far they may continue an approach in the vicinity of thunderstorms before it should be abandoned. Company guidance is generally expressed in rather general terms, and the crew must make this decision by integrating fragmentary and incomplete information from various sources, and playing it by ear. When an aircraft crashes while attempting an approach under these conditions, the crew is typically found to be at fault. Yet there are reasons to suspect that the decision-making of the accident crews was similar to that of crews who were more fortunate. A Lincoln Lab study of radar data at Dallas Fort Worth revealed that when thunderstorms are near the approach path it is not all that uncommon for airliners to penetrate the cells. And in the investigation of windshear accidents it is not uncommon to find that another aircraft landed or took off a minute or two ahead of the accident aircraft without difficulty. Both crews had the same information and made the same decision, but rapidly fluctuating conditions allowed one to land without difficulty and caused the other to crash. In these ambiguous situations, instead of blaming the accident crew for poor judgment, maybe we should focus more on asking what are the industry norms for operating in these situations. Does the industry provide sufficient guidance for pilots to balance competing goals? Do we have explicit policies that sound conservative but implicitly tolerate or even encourage less conservative behavior as long as crews get by with it?
Deviation from Explicit Guidance or SOP
The last cluster involves deviation from explicit guidance or standard operating procedures. An example is attempting to land from an unstabilized approach resulting from a slam-dunk clearance. If the company has explicit stabilized approach criteria, these errors may seem simply to be willful violations. But even here the situation may not be as simple as it seems. Does the company publish and train the stabilized approach criteria as an absolute bottom line or merely as guidance? What are the norms for what pilots actually do in the company and in the industry? We have heard some pilots, even some check pilots, express the view that being unstabilized at 500 feet is not a problem as long as the flying pilot is correcting and gets the aircraft back within normal parameters by touchdown. What these pilots may not grasp is that correcting an unstabilized approach imposes so much workload that the flying pilot does not have enough mental capacity left over to reliably assess whether he is going to be able to get everything back to normal by touchdown.
Cross-Cutting Factors Contributing to Crew Errors
A range of cross-cutting factors in these accidents contributed to the vulnerability of pilots to making these sorts of errors.
Situations Requiring Rapid Response
To our great surprise, nearly 2/3 of these accidents involved situations in which the crew had only a matter of seconds to choose and execute the appropriate response. Examples include upset attitudes, false stickshaker activation just after rotation, anomalous airspeed indications at rotation, pilot induced oscillation during flare, and autopilot induced oscillation at decision height. We were surprised because most threatening situations encountered in airline operations allow the crew time to think through what to do, and in these situations it is important to avoid rushing. We conclude that these 19 accidents included a disproportionately high number of situations requiring very rapid response because, although these situations are quite rare, when they do occur it is extremely difficult for crews to overcome their surprise, assess the situation, and quickly execute the appropriate response. Human cognitive processes simply do not allow pilots to reliably assess novel situations quickly.
Challenges of Managing Concurrent Tasks
The challenge of managing multiple tasks concurrently showed up in the great majority of these accidents. In some cases workload was quite high in the final stages of the accident sequence. In other cases adequate time was available to perform all required tasks, however the inherent difficulty of reliably switching attention back and forth among concurrent tasks may have hampered performance. More effective monitoring might have helped some of these accident crews prevent or detect many of the errors made; unfortunately in many situations monitoring must itself be performed as a concurrent task and is itself subject to the same fragility of becoming preoccupied with the task at hand and forgetting to switch attention to monitor other tasks.
Equipment Failures and Design Flaws
Equipment failures and design flaws appeared in about 2/3 of these accidents. In some cases a design flaw or equipment failure precipitated the chain of events leading to the accident—I already mentioned a false stickshaker warning that occurred right after rotation. In other accidents a flaw or failure undermined the efforts of the crew to manage their situation—for example in several accidents the stickshaker failed to activate when it should have, depriving the crew of critical information.
Although we cannot be sure of the extent, we suspect that stress played a role in many of these accidents by interfering with the crews’ cognitive processes. Stress hampers skilled performance by narrowing attention and reducing working memory capacity required to execute even highly practiced tasks. In particular, the combination of stress and surprise with requirements to respond rapidly and to manage several tasks concurrently, as occurred in several of these accidents, is a lethal setup.
Shortcomings in Training and/or Guidance
Shortcoming in training and/or guidance appeared in more than a third of these accidents. In some cases pilots were not provided adequate guidance about problems known to exist by some segments of the industry. Three of the accidents involved upset attitudes—I’ve already suggested that we need to find ways to develop more realistic scenarios with which to present upset attitude recovery training. But beyond this, we have to find a way to deal with the fact that it is simply not possible to train for every possible situation—which raises the question of how best to provide generic training and procedures that will work in a broad range of unanticipated situations.
Plan Continuation Bias
Plan continuation bias may impede crews’ ability to recognize that they need to change their course of action. This is a powerful but unconscious cognitive bias to continue the original or habitual course of action. This bias may be especially strong during the approach phase, when only a few more steps are required to complete the original plan, and it may operate by preventing pilots from noticing subtle cues indicating that the original conditions have changed.
Social and organizational issues have a pervasive influence but have not been studied in depth. For example, little data is available to accident investigators on the extent to which the accident crews’ actions were typical or atypical in the situation they faced. Also, pilots may not be consciously aware of the influence of internalized competing goals, for example the trade-offs between on-time performance and conservative response to ambiguous situations. Little research has been conducted on this aspect.
Countermeasures? No Easy Solutions
We can start by recognizing that most accidents are systems accidents and by shifting our focus from blaming pilots for errors to identifying vulnerabilities of complex systems and developing countermeasures. Pilots, managers, designers of equipment, and designers of operating procedures should be well educated about human cognitive characteristics and limitations; the design of equipment, procedures, and training should be based on that knowledge. We can never completely eliminate pilot error, but we can reduce the frequency of errors and we can give pilots tools to catch errors and mitigate their consequences.
Countermeasures? No Easy Solutions (Cont'd)
Crew decision-making is hampered by ambiguous, incomplete, and conflicting information, so we should provide crews with better information. Exactly where is the dangerous convective weather in real time? What is the precise nature of the aircraft system failure that is causing warning lights to illuminate from three different subsystems?
Countermeasures? No Easy Solutions (Cont'd)
Programs such as LOSA, FOQA, ASAP, and NOAMS can help generate answers to these questions but we have yet to fully exploit the potential of these programs.
We could beef up our training for upset recovery and for other situations requiring very rapid response. And we should develop specific techniques to train monitoring.
Finally, we should explicitly acknowledge that there are inherent trade-offs between the level of safety and system efficiency. These trade-offs show up in many ways, for example, in decisions in how much training to provide. Cost-benefit trade-offs are inherently policy issues. Those issues should be analyzed explicitly, and all stakeholders should participate. My point is that more can be done to protect safety by explicitly analyzing these trade-offs than by naively expecting crews to perform perfectly. That is why I called this talk The Limits of Expertise.
If you would like to comment on this article, please click here .
We invite your thoughtful feedback and comments, and we encourage you to submit opinions, anecdotes, personal experiences, and ideas that relate to this article. The author of each article will compile a summary of the responses received and post it before the next hot topic is presented.