[FoRK] How Complex Systems Fail
wgstoddard at gmail.com
Fri Nov 6 19:26:32 PST 2009
Ken Ganshirt @ Yahoo wrote:
> --- On Fri, 11/6/09, Bill Stoddard <wgstoddard at gmail.com> wrote:
>> "7) Post-accident attribution accident to a ‘root
>> cause’ is fundamentally wrong. "
>> I have a problem with this assertion. It could be that a
>> 'root cause' analysis is inconclusive, but to simply wave a
>> hand and say 'this is a complex system so we're not going to
>> bother trying to understand what happened' is fundamentally
>> wrong. Maybe I misunderstand the meaning of 'root cause' in
>> this assertion?
> Hmmm... Did we read the same paper?
> He did not say nor imply anything like that in the paper.
> The implication is that, in complex systems "accidents", the usual witch hunt for a single "root cause" of the outage is doomed to failure, by definition. So if the post hoc analysis team is genuinely interested in finding out what really happened they should start the analysis by assuming that a single "root cause" is just a red herring and that the outage will be the result of multiple overlapping failures.
> Perhaps you do misunderstand "root cause". Here's Wikipedia's intro:
> "Root cause analysis (RCA) is a class of problem solving methods aimed at identifying the root causes of problems or events. The practice of RCA is predicated on the belief that problems are best solved by attempting to correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is hoped that the likelihood of problem recurrence will be minimized. However, it is recognized that complete prevention of recurrence by a single intervention is not always possible. Thus, RCA is often considered to be an iterative process, and is frequently viewed as a tool of continuous improvement."
I stand corrected. I was not aware that there was such a formal definition of the phrase 'root cause' and that it was so artificially and narrowly defined. Before reading your reply, it would never have occurred to me that the phrase 'root cause' implied 'single cause'. To my way of thinking (and drawing on the metaphor of tree roots), 'root cause' of a failure is only a 'single cause' in the special case; the more general case (in complex systems) is that failures happen due to interactions of multiple factors (as the author points out). As you analyze failures, you can often (in my experience) identify patters of failure; those patterns (or collections of factors) are 'root causes' and can be dealt with collectively and holistically.
Guess I'm left scratching my head a bit too, wondering why the author felt the need to say 'Post-accident attribution accident to a <single cause> is fundamentally wrong.' Purposefully hitting my hand with a hammer is fundamentally wrong, but I don't feel compelled to say it very often. Maybe I should consider myself lucky at never having directly worked around people who had such 'fundamentally wrong' delusions? OTOH, maybe this delusion is more common outside my place of work than I imagine. Perhaps as common as belief in 'efficient market theory' & pixie dust.. :-)
More information about the FoRK