[FoRK] How Complex Systems Fail
Ken Ganshirt @ Yahoo
ken_ganshirt at yahoo.ca
Fri Nov 6 18:25:37 PST 2009
--- On Fri, 11/6/09, Benjamin Black <b at b3k.us> wrote:
> ... In complex systems, where failure requires an accumulation of faults,
> stopping at 'the' root cause can preclude discovering faults.
> Further, obsession with assigning blame results, in my
> experience, with people becoming too fearful to make any changes
> lest they go wrong and then finding ways to avoid discovering
> underlying causes that might cause the messenger to be shot.
> Instead, a culture of 'ownership' of problems encourages discovery
> of many faults as people seek to display their thoroughness and
> commitment to improvement.
> My experience in web ops, take it as you will.
Mine, also. In a telco, where the complexity is compounded by trying to keep a myriad of proprietary systems functioning together to produce five 9's or better availability.
The operator services systems (411, etc.) I supported for nearly a decade have a zero tolerance for errors and outages. Try to do that with multiple systems cobbled together and connected to proprietary network elements from a variety of vendors. And try to create a management system that can deal with all those proprietary pieces. Definitely good for the continued employment prospects, if not for the hairline. :-)
It's interesting that people get annoyed if they lose a part of their internet service for a short period, but if you ask them in a calmer moment they don't expect it to be perfect. On the other hand, everyone expects that when they lift up the telephone receiver they'll get dial tone (I think a lot of people think it's there all the time, sort of like electricity in the wall outlets!).
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now
More information about the FoRK