When you do a reliability failure study looking to protect your business against large uncommon failures, and those failures that are very common but thought to be too small to be a serious problem, be sure to do lifetime financial analysis when deciding the failures to investigate


I am to do a reliability failure study and would like to know the definition of repetitive failure with respect to some recognized reliability standard.

We want to apply the recommendations in the standard for failure codes in our CMMS system to decide which equipment failures need a reliability failure study. For example, we have a centrifugal pump bearing fail twice in a year. I am of the opinion this is a repetitive failure.


Hello Friend,

Thank you for the clarification of your situation. I sent you the link to www.barringer1.com/mil.htm webpage before understanding what you are trying to do. It is a useful website and I am sure that you will get a definition of repetitive failure somewhere amongst all those USA military and NASA standards. But I fear that it will lead you astray in achieving your objective.

I see two major issues confronting you that will not be solved by only using a definition of repetitive failure for choosing when to do an equipment reliability failure study.

One, is that using regularity of failure alone to decide whether to do a reliability study is not sufficient for making a proper business decision. You must also consider the annualized cost of the failure events. It is important how many times you have a failure. And it is also important how much money you lose each time there is a failure. I would strongly recommend that you include financial trigger points in deciding whether to do a reliability study of a failure event.

There are two extremes to consider, a US$1M failure that happens once in ten years and a US$50K failure event that happens every six months has equivalent annualized cost to the business. If you classify the six monthly event as a repetitive failure but not the once in ten year event because it is not repetitive enough you will surely lose US$1M every ten years simply because the ten year event is not observable as being recurring.

At the other extreme, if you have a recurring US$1,000 failure each week but it is considered to be such a low value that it is drowned in the noise of monthly expenditure, you will not notice that you lose US$52K each year to a supposedly negligible failure event. In ten years that is over half a million dollars thrown away. Small repetitive failures are like small leaks in a sea going ship, you might make your destination with a few leaks that drip, but you will not make port if you have many leaks that drip, or a few leaks that pour water into the vessel.

The only way to help protect your business against large uncommon failures and failures too small to be seen as a problem is to also include proper financial considerations when deciding which failures to investigate.

The second concern I see is that by waiting for a failure event to happen and be recorded in SAP before deciding to do a reliability study you have unintentionally condoned a reactive culture in your company.

If the first time anyone thinks about investigating a failure is after the work order to repair it is closed off in SAP, you have almost certainly guaranteed there will be many failure incidents to investigate in the years to ahead. I would rather that you approach reliability studies proactively by identifying those events that will cause your company unwanted costs and disruption and do a reliability study to prevent from happening. You will then create equipment reliability and make your company hugely successful in the coming years.

You are right at present to use the reliability studies to analyze failures and address their causes, but also use them proactively to remove operating risk by preventing failures happening. Use reliability studies to protect your equipment against failure. If you wait for failure before resolving the problem you will always be inundated with disasters.

One final thought I want to make sure that you are aware of, and you may already know this, is how to use the learning from the reliability studies of failures for the greatest benefit of your company and its future.

Every failure in your company is the result of the collective effect of the beliefs and values held by the people in your company. These beliefs and values permeate everything and everyone. Equipment failures are not an absolute certainty. Like accidents, equipment failures are caused, which means they can also be totally prevented. That is why I say that the failures in your company are a symptom of its beliefs and values. With the right beliefs and values there will be no failures because they would have been prevented by doing the proper things that cause lifetime reliability in your equipment.

Hence a failure in one of your machines will eventually happen to all other similar machines because it is perpetuated by the practices you use, which are the result of what the people think is correct and true. Thus with every failure you develop a viable solution for you must take that solution to all the other equipment in your company with the same components that failed. Even though they may not yet have failed, they will, unless you proactively impose the right solution on them to prevent the failures.

A perfect example of the effect of commonly held wrong beliefs is the centrifugal pump roller bearings that you mentioned failed twice in a year. Good quality bearings properly cared for (i.e. combined atomic stresses from operation and distortion well below material strength, less than 14/11 ISO 4406 solid particle count, correct fits and tolerance, etc.) will go at least 5 years between failures and even 10 years should be expected. That you have two bearing failures a year on the same centrifugal pump tells me a great deal about the values and beliefs held by the senior people in your company. Based on the one repetitive failure you mention can I predict with high confidence that your operation regularly has bearing failures in its machines. There cannot be any other result because the beliefs drive the outcomes.

To break the cycle of misunderstanding take the learning from every failure investigation throughout your business, and also into those of your suppliers and vendors (they start many of the failures that you have to fix). With truth and correct knowledge properly applied you will save tens of thousands of future failures. If you do this you will make your company one of the great businesses of the world.

I hope that these thoughts are of use to you as you develop your company’s reliability failure study policies.

My best regards to you,

Mike Sondalini

P.S. If you have questions on life cycle asset management, equipment maintenance strategy, defect elimination and failure prevention, or plant maintenance and reliability, please feel free to contact me by email.