[FoRK] How Complex Systems Fail
b at b3k.us
Fri Nov 6 17:14:53 PST 2009
Bill Stoddard wrote:
> Interesting, however there's a nagging doubt eating away at me ... that
> this is really directed an deflecting medical malpractice. "Hey, it's a
> 'complex system' so it's not my fault that I f&*&*k'ed up"
> "7) Post-accident attribution accident to a ‘root cause’ is
> fundamentally wrong. "
> I have a problem with this assertion. It could be that a 'root cause'
> analysis is inconclusive, but to simply wave a hand and say 'this is a
> complex system so we're not going to bother trying to understand what
> happened' is fundamentally wrong. Maybe I misunderstand the meaning of
> 'root cause' in this assertion?
I can't speak to the medical aspects of this paper, but I loved it as a
clear statement of the challenges in operations for online services.
The 'root cause' hunt is indeed a distraction, not because looking for
causes of trouble is bad, but because if you are looking for 'the' root
cause you will often stop as you find _any_ cause. In complex systems,
where failure requires an accumulation of faults, stopping at 'the' root
cause can preclude discovering faults.
Further, obsession with assigning blame results, in my experience, with
people becoming too fearful to make any changes lest they go wrong and
then finding ways to avoid discovering underlying causes that might
cause the messenger to be shot. Instead, a culture of 'ownership' of
problems encourages discovery of many faults as people seek to display
their thoroughness and commitment to improvement.
My experience in web ops, take it as you will.
More information about the FoRK