Root Cause Analysis - the art of
solving complex problems
To solve problems effectively and
prevent recurrence requires a systematic approach.
The general process is as follows:

1) The first step is
to define the problem so everyone tries to solve the
same problem and has the same direction in their
efforts for maximum results and efficiency. So for
this we can use part of Kepner-Tregoe's (link)
model for Problem Analysis. You must define what
object has a problem in relation to what fails,
where was the failure, when was the failure and to
what extent. Then you try to define similar objects
that could have failed also, but does not. Then
finally you determine what the difference is between
the failing and non failing objects - what is
distinctive about the failing object, because this
is where the causes of failure must be rooted or the
others would also fail.

2) The next step is
to understand cause-effect relationships around the
problem. A powerful methodology for complex analysis
is the one defined by Apollo (link),
which has the following rules when drawing a
Cause-Effect Chart:
-
Every effect
has action causes and condition causes - at
least one of each, so they should be identified.
-
Action causes
and condition causes must exist in the same
space and time to cause the effect.
-
There is an
infinite chain of causes so we look until we
don't know or don't care.
-
There is cause
and effects between any causes and effects
if we look close enough at the "baby steps",
e.g. Titanic sank because it hit an iceberg, but
why did it sink by hitting an iceberg? - because
the hull opened and it was filled with water.
-
Every cause
must have evidence, e.g. "was observed",
"object exists", etc.
-
Effective
solutions must prevent recurrence, be within our
control and meet our goals/objectives, e.g.
the solution is not more expensive that the cost
of living with the problem.
To draw an Apollo
Cause-Effect Chart you go through these steps:
-
Go from the
Problem towards the right side and add a chain
of causes by asking "why" again and again until
you don't know or don't care.
-
Repeat several
times and make sure there is at least one action
cause and one condition cause for each effect.
-
Go through every
cause and add evidence.
-
Now go from the
right side towards the left and add solutions to
each cause.

3) Identify possible
root causes. This can be done by looking at the
Apollo Cause-Effect Chart or by using
Kepner-Tregoe's methodology. Kepner-Tregoe
identifies the changes that has been made to the
failing object which makes it different from the
similar objects that could fail but does not. Then
looking at those changes, hypothesis about Root
Causes can be made for testing. Using the Apollo
chart, identify the most likely causes that are
under our control as the Root Causes.
4) Identify solutions
related to the identified Root Causes. Evaluate each
solution to ensure they are effective solutions
- prevent recurrence, under control and meets goals,
and then test only the effective solutions to see if
the Root Causes has been identified.
5) Verify that the
solutions effectiveness.
Other techniques that
may be useful in RCA are: Mind Mapping, Ishikawa
"Fishbone" Diagrams, Time-Series Analysis and Flow
Charts.
Sources/Links:
Apollo
Apollo eRCA Demo
Apollo Root Cause Analysis book - chapter 1
Kepner-Tregoe
Cause Mapping Demo - The Sinking of Titanic
James J Rooney article - Root Cause Analysis for
Beginners
Mark Doggett article - Root Cause Analysis: A
Framework for tool selection
Root Cause Analysis presentation