Based
on
a presentation
given
at
ALPA
Air
Safety
Week,
19
August
2004
Approach
In this study we reviewed the 19 major U.S. airline accidents between 1990 and 2000 in which the NTSB found crew errors to be among the probable causes. We used the NTSB reports as the raw material for our analysis. We worked our way step by step through the events of the accident flight, and at each step we asked this question: Why might any highly experienced airline crew in the situation of the accident crew and knowing only what the accident crew knew at that moment have responded in much the way the accident crew responded at that step of the flight?
Hindsight Bias
Asking the question this way changes how we think about pilot error and how we think about the causes of accidents. When investigating and analyzing aviation accidents it is very easy to fall into what cognitive scientists call hindsight bias. Knowing the outcome of an accident, i is easy to identify things the crew could have done differently that would have averted the accident. But the accident crew does not know the outcome in advance; they are responding to the situation as they perceive it at the moment.
Crew Responses to Situations
How
crews
respond
to
the
situation
as
they
understand
it
is
a function
of
several
factors:
Obviously,
the
immediate
demands
of
the
situation
and
the
tasks
being
performed
and
the
training
and
experience
of
the
crew
members.
Social
and
organization
influences
include
not
just
formal
procedures,
policies,
and
goals,
but
also
implicit
goals,
and
sometimes
these
are
in
conflict.
The
actual
norms
for
operating
on
the
line,
which
may
or
may
not
always
be
the
same
as
what
is
in
the
Flight
Operations
Manual.
At
the
heart
of
all
this
are
the
characteristics
and
limitations
of
human
cognitive
processes.
All
of
these
factors
interact
and
our
analysis
emphasizes
the
interaction
of
these
factors.
A Truism about Accidents
One thing is immediately obvious about almost all human factors accidents we have looked at over the years: No one thing “caused” the accident; rather accidents typically occur from the confluence of multiple events, actions taken or not taken by individuals, and the environment in which those individuals operate. Although this truism is recognized by the industry we need to look more deeply at the implications of this fact.
Confluence of Factors in a CFIT Accident
This confluence is illustrated in this slide of factors in a CFIT accident in which a MD-83 struck trees 310 feet below MDA on a non-precision approach in 1995. I wonder if all pilots are even aware that non-precision approaches in the U.S. can give as little as 250 feet of terrain clearance? Weather played a central role in this accident in several ways: barometric pressure was changing rapidly, there was a strong crosswind, the tower window broke, causing the tower to close. The approach controller failed to update the altimeter setting before switching frequency and the crew did not request an update. At the time the airline used QFE altimetry. Workload was high, in part because of the crosswind and the PF’s use of Heading Select. The crew slightly mis-set the altimeters, causing a 70 foot error; this plus the rapid change in barometric pressure caused the aircraft to fly 170 feet lower than the altimeters indicated. The pilot flying (PF) used Altitude Hold to capture MDA, however in turbulence Altitude Hold can allow the aircraft to sag as much as 130 feet below the target altitude. The pilot monitoring (PM) called out that they were passing through MDA, using somewhat non-standard phraseology and the aircraft struck trees 310 feet below minimum descent altitude (MDA). Fortunately, although the aircraft was heavily damaged, no one was severely injured. My point in going through this slide is not to talk about the details of this specific accident. None of the factors in this accident were all that extraordinary; it was the particular way that all of these factors happened to combine that produced the accident. It’s popular these days to talk about the “accident chain”, but clearly this is not a single chain, it is a confluence of many factors operating in parallel and converging. The occurrence and combination of all these factors in a single flight is to large degree a matter of chance, and there is the great challenge.
Chance Combination of Contributing Factors
In our modern airspace system accidents are very rare events because the industry has developed ways of preventing single-point failures in equipment or human performance from causing accidents. The few accidents that slip through all the defenses occur because of the largely random combination of many factors. This random aspect makes it very difficult to develop countermeasures to prevent future accidents because the number of combinations and permutations of factors that might occur in a flight are virtually infinite. And it is very scary—none of us like to feel we are operating in a system that we cannot completely control. Until we deal with this reality we will not make much progress in improving aviation safety. It is commonplace to propose that we teach pilots to avoid accidents by “breaking the accident chain” but it is not at all clear how pilots can recognize in the midst of a demanding flight the many different ways multiple factors can combine by chance to lead to an accident.
6 Overlapping Clusters of Error Patterns
In our study we took the crew errors identified by the NTSB at face value—in hindsight it is clear some of the actions the crews took were not the right things to do. However by the end of this talk I hope you will be asking questions about how we should think about these errors and what they represent. We found the errors in the 19 accidents to cluster roughly into six groups, with some overlap among the groups. These clusters are defined as much by the situations confronting the pilots as by the form of error.
Inadvertent Slips/Oversights in Practiced Tasks Under Normal Conditions
The first group consists of inadvertent slips and oversights while performing highly practiced tasks under normal conditions. Examples are overlooking a checklist item, remembering an altimeter setting incorrectly, and slightly misjudging the landing flare. These errors are very much the same as the errors pilots themselves report making to ASRS and ASAP programs. From a cognitive perspective occasional errors made by experts are an inevitable consequence of the way the brain is wired, the incomplete information available at the time to the expert, and the competing demands of tasks. No pilot is immune to these errors, so in many cases the difference between an accident flight and a routine non-accident flight is not the presence of errors but the happenstance combination of the errors with several other factors.
Inadvertent Slips/Oversights in Practiced Tasks Under Challenging Conditions
The probability of making these inadvertent slips and oversights goes up with workload, time pressure, fatigue, and stress. In some accidents we noted a snowball effect in which decisions or actions at one stage of the flight increased the crew’s vulnerability to making errors later. For example, a crew that continued a highly questionable approach in the vicinity of a thunderstorm put themselves in a high workload situation that may have contributed to their forgetting to arm the spoilers.
Inadequate Execution of Non-Normal Procedures Under Challenging Conditions
Four of these 19 accidents involved inadequate execution of non-normal procedures under challenging conditions. These included imperfect execution of procedures for recovering from a spiral dive, from a stall, and from windshear. These days pilots are trained to recover from these situations, but a recent study by Veridian Corporation revealed that even with upset training many pilots have trouble executing recovery procedures adequately. We suspect that one shortcoming of existing upset recovery training is that in the training pilots are expecting the upset and typically they know which upset they are about to encounter. But in the real world of surprise, confusion, and stress pilots may have trouble identifying the nature of the upset and selecting the correct recovery procedure.
Inadequate Response to Rare Situations for which Pilots are not Trained
The fourth cluster involved inadequate response to rare situations for which pilots are not trained. These situations included a false stickshaker activation just after rotation, an oversensitive autopilot that drove the aircraft toward the ground near Decision Height, anomalous indications from airspeed indicators that did not become apparent until the aircraft was past rotation speed, and an uncommanded autothrottle disconnect whose annunciation was not at all salient. Here too surprise, confusion, stress, and time pressure undoubtedly play a role. No data exist on what percentage of airline pilots would respond quickly and correctly to these situations, but we suspect that performance is unlikely to be reliable under these conditions.
Judgement and Decision-Making in Ambiguous Situations
The fifth cluster of errors involved judgment in ambiguous situations that hindsight proves wrong. An example of judgment in ambiguous situations is continuing an approach toward an airport in the vicinity of thunderstorms. No algorithm exists for crews to calculate exactly how far they may continue an approach in the vicinity of thunderstorms before it should be abandoned. Company guidance is generally expressed in rather general terms, and the crew must make this decision by integrating fragmentary and incomplete information from various sources, and playing it by ear. When an aircraft crashes while attempting an approach under these conditions, the crew is typically found to be at fault. Yet there are reasons to suspect that the decision-making of the accident crews was similar to that of crews who were more fortunate. A Lincoln Lab study of radar data at Dallas Fort Worth revealed that when thunderstorms are near the approach path it is not all that uncommon for airliners to penetrate the cells. And in the investigation of windshear accidents it is not uncommon to find that another aircraft landed or took off a minute or two ahead of the accident aircraft without difficulty. Both crews had the same information and made the same decision, but rapidly fluctuating conditions allowed one to land without difficulty and caused the other to crash. In these ambiguous situations, instead of blaming the accident crew for poor judgment, maybe we should focus more on asking what are the industry norms for operating in these situations. Does the industry provide sufficient guidance for pilots to balance competing goals? Do we have explicit policies that sound conservative but implicitly tolerate or even encourage less conservative behavior as long as crews get by with it?
Deviation from Explicit Guidance or SOP
The last cluster involves deviation from explicit guidance or standard operating procedures. An example is attempting to land from an unstabilized approach resulting from a slam-dunk clearance. If the company has explicit stabilized approach criteria, these errors may seem simply to be willful violations. But even here the situation may not be as simple as it seems. Does the company publish and train the stabilized approach criteria as an absolute bottom line or merely as guidance? What are the norms for what pilots actually do in the company and in the industry? We have heard some pilots, even some check pilots, express the view that being unstabilized at 500 feet is not a problem as long as the flying pilot is correcting and gets the aircraft back within normal parameters by touchdown. What these pilots may not grasp is that correcting an unstabilized approach imposes so much workload that the flying pilot does not have enough mental capacity left over to reliably assess whether he is going to be able to get everything back to normal by touchdown.
Cross-Cutting Factors Contributing to Crew Errors
A range of cross-cutting factors in these accidents contributed to the vulnerability of pilots to making these sorts of errors.
Situations Requiring Rapid Response
To our great surprise, nearly 2/3 of these accidents involved situations in which the crew had only a matter of seconds to choose and execute the appropriate response. Examples include upset attitudes, false stickshaker activation just after rotation, anomalous airspeed indications at rotation, pilot induced oscillation during flare, and autopilot induced oscillation at decision height. We were surprised because most threatening situations encountered in airline operations allow the crew time to think through what to do, and in these situations it is important to avoid rushing. We conclude that these 19 accidents included a disproportionately high number of situations requiring very rapid response because, although these situations are quite rare, when they do occur it is extremely difficult for crews to overcome their surprise, assess the situation, and quickly execute the appropriate response. Human cognitive processes simply do not allow pilots to reliably assess novel situations quickly.
Challenges of Managing Concurrent Tasks
The challenge of managing multiple tasks concurrently showed up in the great majority of these accidents. In some cases workload was quite high in the final stages of the accident sequence. In other cases adequate time was available to perform all required tasks, however the inherent difficulty of reliably switching attention back and forth among concurrent tasks may have hampered performance. More effective monitoring might have helped some of these accident crews prevent or detect many of the errors made; unfortunately in many situations monitoring must itself be performed as a concurrent task and is itself subject to the same fragility of becoming preoccupied with the task at hand and forgetting to switch attention to monitor other tasks.
Equipment Failures and Design Flaws
Equipment failures and design flaws appeared in about 2/3 of these accidents. In some cases a design flaw or equipment failure precipitated the chain of events leading to the accident—I already mentioned a false stickshaker warning that occurred right after rotation. In other accidents a flaw or failure undermined the efforts of the crew to manage their situation—for example in several accidents the stickshaker failed to activate when it should have, depriving the crew of critical information.
Stress
Although we cannot be sure of the extent, we suspect that stress played a role in many of these accidents by interfering with the crews’ cognitive processes. Stress hampers skilled performance by narrowing attention and reducing working memory capacity required to execute even highly practiced tasks. In particular, the combination of stress and surprise with requirements to respond rapidly and to manage several tasks concurrently, as occurred in several of these accidents, is a lethal setup.
Shortcomings in Training and/or Guidance
Shortcoming in training and/or guidance appeared in more than a third of these accidents. In some cases pilots were not provided adequate guidance about problems known to exist by some segments of the industry. Three of the accidents involved upset attitudes—I’ve already suggested that we need to find ways to develop more realistic scenarios with which to present upset attitude recovery training. But beyond this, we have to find a way to deal with the fact that it is simply not possible to train for every possible situation—which raises the question of how best to provide generic training and procedures that will work in a broad range of unanticipated situations.
Plan Continuation Bias
Plan continuation bias may impede crews’ ability to recognize that they need to change their course of action. This is a powerful but unconscious cognitive bias to continue the original or habitual course of action. This bias may be especially strong during the approach phase, when only a few more steps are required to complete the original plan, and it may operate by preventing pilots from noticing subtle cues indicating that the original conditions have changed.
Social/Organizational Issues
Social and organizational issues have a pervasive influence but have not been studied in depth. For example, little data is available to accident investigators on the extent to which the accident crews’ actions were typical or atypical in the situation they faced. Also, pilots may not be consciously aware of the influence of internalized competing goals, for example the trade-offs between on-time performance and conservative response to ambiguous situations. Little research has been conducted on this aspect.
Countermeasures? No Easy Solutions
It
will
not
be
easy
to
reduce
vulnerability
to
the
kinds
of
accidents
I’ve
been
describing.
The
U.S.
airline
system
already
operates
at
a very
high
level
of
safety;
consequently
every
accident
that
occurs
in
this
system
is
a unique
combination
of
events,
circumstances,
and
errors
that
slipped
through
the
system
of
existing
defenses.
But
I will
make
a few
high-level
suggestions.
We
can
start
by
recognizing
that
most
accidents
are
systems
accidents
and
by
shifting
our
focus
from
blaming
pilots
for
errors
to
identifying
vulnerabilities
of
complex
systems
and
developing
countermeasures.
Pilots,
managers,
designers
of
equipment,
and
designers
of
operating
procedures
should
be
well
educated
about
human
cognitive
characteristics
and
limitations;
the
design
of
equipment,
procedures,
and
training
should
be
based
on
that
knowledge.
We
can
never
completely
eliminate
pilot
error,
but
we
can
reduce
the
frequency
of
errors
and
we
can
give
pilots
tools
to
catch
errors
and
mitigate
their
consequences.
Countermeasures? No Easy Solutions (Cont'd)
Plan
continuation
bias,
task
saturation,
and
inadvertent
omission
of
procedural
steps
are
examples
of
inherent
cognitive
vulnerabilities.
We
should
systematically
review
normal
and
non-normal
operating
procedures
to
insure
they
are
consistent
with
human
abilities
to
manage
competing
task
demands,
time
pressure,
and
stress.
Conservative,
hard
bottom
lines,
such
as
stabilized
approach
criteria
are
important
defenses
against
the
consequences
of
error,
but
to
be
effective
these
bottom
lines
must
be
practiced
consistently
and
vigorously,
and
they
must
take
precedence
over
concern
with
time
and
fuel
costs.
Crew
decision-making
is
hampered
by
ambiguous,
incomplete,
and
conflicting
information,
so
we
should
provide
crews
with
better
information.
Exactly
where
is
the
dangerous
convective
weather
in
real
time?
What
is
the
precise
nature
of
the
aircraft
system
failure
that
is
causing
warning
lights
to
illuminate
from
three
different
subsystems?
Countermeasures? No Easy Solutions (Cont'd)
We
need
much
better
information
about
how
the
airspace
system
operates
and
how
crews
typically
respond.
For
example,
how
often
and
at
what
airports
do
controllers
issue
slam-dunk
clearances
and
last
minute
runway
changes?
How
do
pilots
typically
deal
with
these
challenges
and
how
close
do
they
come
to
the
edge
of
the
safety
envelope?
How
close
do
flights
come
to
storm
cells
during
arrival
and
departure,
and
is
the
variation
among
flights
due
to
differences
in
crew
judgment
or
to
chance?
Programs
such
as
LOSA,
FOQA,
ASAP,
and
NOAMS
can
help
generate
answers
to
these
questions
but
we
have
yet
to
fully
exploit
the
potential
of
these
programs.
We
could
beef
up
our
training
for
upset
recovery
and
for
other
situations
requiring
very
rapid
response.
And
we
should
develop
specific
techniques
to
train
monitoring.
Finally,
we
should
explicitly
acknowledge
that
there
are
inherent
trade-offs
between
the
level
of
safety
and
system
efficiency.
These
trade-offs
show
up
in
many
ways,
for
example,
in
decisions
in
how
much
training
to
provide.
Cost-benefit
trade-offs
are
inherently
policy
issues.
Those
issues
should
be
analyzed
explicitly,
and
all
stakeholders
should
participate.
My
point
is
that
more
can
be
done
to
protect
safety
by
explicitly
analyzing
these
trade-offs
than
by
naively
expecting
crews
to
perform
perfectly.
That
is
why
I called
this
talk
The
Limits
of
Expertise.
If
you
would
like
to
comment
on
this
article,
please
click
here
.
We
invite
your
thoughtful
feedback
and
comments,
and
we
encourage
you
to
submit
opinions,
anecdotes,
personal
experiences,
and
ideas
that
relate
to
this
article.
The
author
of
each
article
will
compile
a
summary
of
the
responses
received
and
post
it
before
the
next
hot
topic
is
presented.