Conger-Elsea Root Cause Analysis Workshop
I recently attended a Root Cause Analysis workshop by Conger & Elsea. This is the fourth such system I have been formally trained in. (I have been to Kepner-Tregoe, PROACT, and Causemap I and II training, and used them all successfully in professional practice). A few of my notes and thoughts follow.
The bottom line is that this training is on par with the best training workshops I have attended and adds valuable tools to my repertoire. I highly recommend this training for failure analysts, RCA facilitators, technical managers, and reliability engineers. The “toolkit” learned particularly extends the ability of the analyst to study management, systematic, or “latent” causes of accidents or safety incidents.
Key Takeaways
- Three numbered lists are maintained as a part of each investigation: documents reviewed (D1, D2…), interviews (I1, I2…), and site (or object) observations (O1, O2…). [This seemed like a pretty good way to categorize data collection to me.]
- Events & Causal Factors (E&CF) analysis is an augmented event-based timeline that associates conditions with events. Each event or condition requires, at a minimum, a subject, a verb, a time, a location, and a source. The source will be a reference to one of the three lists (D11, O5, etc.). Early on in the investigation the diagram shows “events and conditions.” Later, conditions can earn the status of “causal factor.”
- Change Analysis (CA)—called a “poor man’s Kepner-Tregoe” in the workshop—is a way of comparing two options or courses of action. Although strictly limited to only two options, it can be applied to more by maintaining a working “best” and making comparisons only to that one option, until a better one is found to take its place. [I actually used this method—mentally—to buy a truck recently. I used Kepner-Tregoe Decision Analysis on my "short list" when purchasing a home.]
- Hazard-Barrier-Target (HBT) analysis is a way to examining the injury prevention methods in place. to decide if they were adequate.
- Fault Tree (FT) analysis is a way of asking how could an incident have occurred. Each branch becomes a hypothesis that can be tested. This is a useful tool when causes are unknown, but need to be known to advance the investigation.
- Management Oversight and Risk Tree (MORT) analysis is a tool to deeply analyze the management factors behind safety incidents. This approach would be very useful when, for example, hypothesizing latent causes via the PROACT approach. The premise of that all incidents are either the result of explicitly documented assumed risks or management oversights. It is the management oversights that get the most focus in the tree, and the questions associated with the hundreds of branches provide most of the value of the process. [This tool requires good knowledge of the management system, and is therefore not recommended until at least one third of the way through the investigation. At that point, the investigators should have held enough interviews and reviewed enough documents to be able to answer MORT questions.]
- Extent of condition and extent of cause are separate analyses from incident investigations. Sometimes an incident will be challenged as a “one-time” event. This criticism may or may not be justified, but the validity of the investigation depends upon how well causal factors for *the incident* are determined. The report should reflect this, and recommend extent of condition or cause studies to determine how important the implementation will be.
- The team charter is very important because it contains the objective of the investigation. There were several times in the training when pointing back to the charter was essential to keeping the team on track. ["While we could look more into the emergency response, our charter calls for us to determine causal factors, which by definition come before the incident, so let's focus instead on..."]
- “If you value it, you will protect it, unless you can afford to replace it.”
- “As risk increases, so should the level of protection.”
- Corrective actions/recommendations should “restore” barriers (see HBT analysis, above), breaks chains of events or remove causal factors (see E&CF), removal critical path(s) in fault trees, or reduce the consequence of recurring failure. This is how recommendations are tied back to the analysis. If it does not do any of these things, it does not belong in the report.
Advantages
- Conger-Elsea’s approach to incident investigation is used by the Nuclear Regulatory Commission (NRC), the Canadian NRC, NASA, and the U. S. Navy, which lends heavy credibility to their methods.
- The system draws from a handful of mature, proven tools for conducting an investigation. The system stands alone as an effective method for conducting investigations on par with PROACT.
- Because Conger & Elsea’s system always ties conclusions back to sources, and recommendations to analyses, the results will have rigor similar to a good PROACT analysis. Causemap and 5-why root cause analyses do not require this rigor (though such rigor is possible using those methods), so the results can be erroneous.
Limitations
- The course is short on hardware analysis. Separate training or study in this subject will be required to make a “complete analyst.”
- A few of my companions, particularly supervisors with limited backgrounds in incident investigation, were understandably confused during parts of the training. I would not recommend this particular course as an introduction to cause analysis and incident investigation since some understanding of some fundamentals was assumed. [ThinkReliability has an introduction to cause analysis that would be more productive and informative to such people.] The exception to this would be if someone was working in nuclear power, in which case it would be best to learn the language of the industry. [This is more a reflection on who was sent than anything to do with the course.]
- The Mishap Analysis and Prevention (MAPS) software that is provided at the end of the course seems to be 32-bit software that I can not install on my Windows 7 Home Premium laptop. I will attempt to install and use it on a Windows XP machine, but it really should not be necessary. [It looks like a good tool. I will review it separately if I can get it on a computer.] Without these tools, the analyst would have only a limited number of paper copies of the MORT chart to work from. Sustaining use of MORT would really be hard without being able to print new charts.
- The interview practice was good, but they were a bit easier than actual practice. Interviewees were very open and forthcoming, and a little defensiveness or reluctance on the part of the interviewees would have added a little realism to the simulation. [The oral briefing was excellent with plenty of audience skepticism and questioning. "Are you saying that I don't care about safety?"]
- Generally, it is easiest to align upon risk ratings when frequency of exposure and probability of injury are separated. The risk rating system used in the course only used severity and probability, which makes group alignment on risk difficult.
















