REPLY Definition of root cause (SD6806)
SDMAIL Jack Harich
register at thwink.org
Wed Mar 12 06:00:23 CDT 2008
Posted by Jack Harich <register at thwink.org>
SDMAIL Ulrey, Michael L wrote:
> This thread is of particular interest to me because "root cause" is a
> term which is very popular in the fields of system engineering in
> general, and reliability and safety engineering in particular. (And
> among other things, I work in system safety engineering at Boeing.)
Thanks for sharing this. I wonder why those solving problems with system
dynamics (as represented by replies on this list) tend to not find
thinking in terms of root causes productive. Do you have any theories on
this? I'm starting to get a little stumped.
> Of course, other industries also have similar practices, most notably
> the nuclear and chemical industries. This kind of approach works well
> for systems or processes which have a certain (low) level of
> complexity. However, for much more complex socio-technical systems
> (such as air traffic management), such an approach may come up short.
> Ever since the Uberlingen tragedy (see
> http://www.dcs.gla.ac.uk/~johnson/Eurocontrol/Ueberlingen/Ueberlingen_Final_Report.PDF
> or a variety of other discussions on the Internet), the safety
> community is starting to realize that, as one prominent accident
> expert said, "the root cause is simply the last contributing factor
> that was found before the resources ran out".
This would be the last event that directly caused the undesired system
behavior. This clearly is not a root cause, so what is the "prominent
accident expert" trying to say here? That there is no one root cause? If
so, I agree. There are usually multiple root causes for difficult problems.
"However, for much more complex socio-technical systems (such as air
traffic management), such an approach may come up short." - Without
saying why, this says that root cause analysis doesn't work for "more
complex socio-technical systems"? I wonder why? Does anyone have an
opinion on this?
Maybe it's that root cause analysis, as popularly defined, doesn't work.
But that should not prevent us from developing another approach that can
successfully and routinely find root causes in difficult social problems.
> In short, there is a movement away from traditional static, linear
> models such as the domino model or the Swiss cheese model. See the
> first link below for the revised thinking on the Swiss cheese model as
> a result of the Uberlingen accident. Instead, more complex systemic
> models are needed that reveal how accidents can arise naturally from
> the internal dynamics of the system, not necessarily from outside
> causes, or even from internal failures, as is traditionally supposed.
> This idea is ell-expressed in the second link below, in which the
> "functional resonance accident model" (FRAM) is described.
>
Great links. Thanks! Very interesting. I think that in some cases
functional resonance can explain a social problem. An example would be
the outbreak of war. But at the same time, there are deep underlying
causes that, if resolved, would make it very unlikely for wars to occur.
These are what I refer to as root causes.
In the case of the European community, one such root cause was lack of
shared reasons to cooperate. This was resolved by creating the European
Union. Indeed, this was the main reason the EU was formed: to avoid more
wars in Europe. If we continue to see no more wars between countries in
the EU, then this must be because they intentionally resolved the root
cause of insufficient motivation to cooperate, and/or that additional
root causes somehow resolved themselves. I lean towards the former as
the explanation. But not living in the EU I am not abreast of many EU
issues, and so could easily be missing something.
> Finally, I would offer up the recent book by Hollnagel, Woods, and
> Leveson, called "Resilience Engineering", which contains a series of
> essays on these topics. The view is that some systems that affect our
> lives and safety must be designed to be resilient, that is, able to
> recover from inevitable surprising upsets, rather than attempting to
> make them totally free of defects, which is practically impossible. I
> think this is an area that may be unknown to many SD theorists, and
> provides a rich area for research. In fact, one of the authors, Nancy
> Leveson of MIT, has used SD to model the NASA organizational structure
> and processes to try and understand the Challenger and Columbia
> shuttle accidents.
>
>
Nice, especially since I just read Feyman's "What do you care what other
people think?" last week. The second half of the book was his take on
the Challenger disaster, and his role in deciphering the deeper causes.
"some systems... must be designed to be resilient" - Yes. This includes
social systems.
> On a personal note, I have recently learned how to model in Vensim,
> and have already produced a simple model having to do with air traffic
> management, and how to increase capacity while preserving an
> appropriate level of safety. I am hoping to continue with this work,
> having proposed a research project within my organization at Boeing. I
> think SD can make a real contribution in this important arena.
>
It probably can. Good luck with it.
SDMAIL Jack Homer wrote:
> Posted by "Jack Homer" <jhomer at comcast.net>
>
> Now that Jack Harich has clarified that he's looking for "the key
> elements of the model [or, better, the real world or the problematic
> system] that cause it to behave the way it does", I think we have an
> answer for him from SD and need look no further. We call "root cause"
> the dynamic hypothesis.
Hi Jack. Great discussion!
Page 86 of Sterman says "Formulate a dynamic hypothesis that explains
the dynamics as endogenous consequences of the feedback structure." This
implies that the definition of "dynamic hypothesis" is "the feedback
structure" of the model. This seems to mean the entire model.
Or Bill Braun, in an earlier message, suggested that "Jack Harich asks
about the definition of root cause. Broadly stated, it would be the
structure that produces the reference mode up to the present and, if
policies were left as is, would result in the 'feared' mode in the future."
This is the point at which my viewpoint diverges from the mainstream. I
view the root cause(s) of a problem as specific points or areas in a
model. Others seem to view it as the entire model, since it takes the
entire model to reproduce the reference mode. So I'm confused. What to
you is the root cause?
> And, we call "good solution" (the other concept Jack talks about) a
> high leverage point. Note that a high leverage point can refer to
> turning an existing lever or, alternatively, intervening in a way that
> creates an entirely new feedback loop.
Sorry if I haven't explained myself clearly. To me a high leverage point
(HLP) is usually a specific node in a model. Changing its value
represents the pressure of a solution. So the HLP is WHERE you apply
pressure. HOW you apply it is the solution. There are many ways to push
on a HLP, so there are many possible solutions. For example, the long
original version of the Dueling Loops paper lists six solution elements.
Each pushes on the HLP in a different way. The exact mix of how each
solution element is applied is the total solution.
The other case is the one you mention: creating a new feedback loop(s).
But even a new loop serves to push on existing nodes. Otherwise there is
no way to attach it to the system. This is why I tend to think in terms
of HLPs as being specific nodes.
As far as I can tell, it's much harder to find the HLPs on a difficult
problem than it is to determine how to push on them. Once the HLPs are
found, figuring out how to push on them is relatively easy. A similar
argument holds for root causes versus solutions. Once you find the true
root cause, how to resolve it is often strategically obvious. For
example, I did a presentation Friday of last week in which a slide said:
1. Identify the patient: The environmental movement
2. Describe the symptoms: Unable to solve the sustainability problem
3. Perform a diagnosis: Use of the wrong tools on difficult problems,
due to use of Classic Activism. This causes pushing on low leverage
points. (the root cause)
4. Determine the treatment: Use of the right tools on difficult
problems, due to use of Analytical Activism. This will allow pushing on
high leverage points. (the solution)
Notice how easily the solution follows from the root cause. If the root
cause is the WRONG tools, then the solution must be the RIGHT tools.
I hope this explains the notion that HLPs and solutions are two
different things. The fact they are allows problem solvers to decompose
one big step into two smaller steps, each of which is much easier to
solve. This is what process maturity is all about: more efficient
problem decomposition.
> The presence of the word "hypothesis" in dynamic hypothesis reflects
> my earlier comment that we can never really peel to the heart of the
> onion as one might do in an engineered system, but rather can only go
> down a few layers to where we can see some underlying delay and
> feedback structures that may help to explain the source and
> persistence of a problem.
"we can never really peel to the heart of the onion" - Yes. But that is
merely the present state of the soft sciences. As they mature, we will
come to see the structure of social systems as clearly as we can see the
structure of physical systems today. SD is one tool that allows us to do
this. But we need more. I'm arguing that a more mature process is one of
them. Such a process will have powerful productive steps, like finding
the root cause.
Also, as you say we don't need to completely peel the onion. We only
need to understand the system enough to solve the problem. That's what
the System Understanding step of the System Improvement Process is all
about.
> Posted by Ralf Lippold <ralf_lippold at web.de
>> The System Improvement Process (SIP) is designed to solve any
>> difficult social problem. Its four main steps are:
>>
>> 1. Problem Definition Identify the problem
>>
>> 2. System Understanding (Analysis) Analyze the problem/system until
>> key cause
>>
>
> This can't be more than the "key cause" that can be understood by the
> people involved when questioned the "5-Whys" in a very deep manner.
> The questions should be asked until the 5th question or a phase where
> there emerges no additional seeable reason.
>
Actually the System Improvement Process (SIP) goes quite a bit deeper
than the 5 whys. The five substeps of step 2 of SIP allow analysts to
find five different classes of causes. Recall the five substeps are:
A. Find the feedback loops that are currently dominant.
B. Find the root cause of why they are dominant.
C. Find the low leverage points and symptomatic solutions.
D. Find the feedback loops that should be dominant.
E. Find the high leverage points to make them go dominant.
Thinking abstractly, the first cause of the symptoms is the loops that
are currently dominant. The cause of that is the root cause. The cause
of solution failure to date is pushing on the LLPs with symptomatic
solutions. The cause that would resolve the root cause is the loops that
should go dominant. The cause that would do this is the HLPs.
I hope this helps you see there is a far more penetrating way to
understand the key cause and effects of a difficult problem than the
five whys.
> So you avoid at least the "quick fix" that leads into the "Fixes that
> Fail" archetype and even more trouble than you could image.
>
Yes, a very good point.
Thanks,
Jack
Posted by Jack Harich <register at thwink.org>
posting date Tue, 11 Mar 2008 23:30:32 -0400
More information about the SDMail
mailing list