Critical Spare Parts Inventory Selection Guide Notes in PWW EAM System-of-Reliability Methodology


This guidance note aims to help operations determine their critical spare parts inventory using a risk-based methodology. It also helps identify and adopt effective maintenance and reliability strategies and actions to minimize the number of critical spares. The purpose of a critical spare is to eliminate operating risk, or at least to reduce risk to as low as reasonably practicable. Thus, the decision to carry an equipment critical spare parts inventory is based on risk elimination and risk management principles. First, the right engineering, operating, and maintenance strategies to reduce equipment failure risks are selected. If the remailing risk from a part is still too high, it is a candidate for inclusion into the critical spare parts inventory. Determining which equipment parts need a critical spare is done as described within this guidance note in organizations that use the Plant Wellness Way EAM System-of-Reliability methodology.

Keywords: spare parts risk criticality, critical spare parts, critical spare, critical spares inventory, operating risk reduction,

Organizational problems this article helps you to address:

    •  Decide which equipment parts to keep in the maintenance spares inventory.
    • Identify the amount of risk each equipment component in an asset causes to an operation.
    • Select engineering, operational, and maintenance tasks to reduce spare parts inventory.

Definition of a Critical Spare Part

A part is failed if it cannot reliably deliver its service and functions because it can no longer take its full range of service duty stresses. Complete failure happens when it no longer delivers all its necessary functionality. The cause can be its microstructure being deformed to breakage, or the microstructure is degraded so much that the remaining material-of-construction cannot take the service loads and fractures.

A critical spare part is an equipment component where its failure causes unacceptable consequences. The consequences can be safety related, production related, environmentally related, commercial losses, legal consequences, as well as all other bad effects.

Determine the Cost of Equipment Parts Failure using Total Defect and Failure Costing

In the Plant Wellness Way EAM System-of-Reliability methodology the cost-of-failure is calculated by using the Total Defect and Failure Cost (TDAF Cost) activity-based costing template. Each part listed in the equipment bill-of-materials is analysed for its consequential TDAF Cost from the business-wide effects of its failure. That individual TDAF Cost value is its ‘Criticality 1’ rating. Its Criticality 1 is the total of all financial costs, wastes, and losses corporate wide from a complete failure.

When determining Criticality 1 it is presumed the operation’s processes and assets all function as designed but there is no spare part available to replace a failed part. Whether a part needs to be in inventory or not is decided by the outcome of the spare part selection analysis process described in this document.

Identify a Critical Spare Part by the Risk They Cause

A part’s criticality is a measure of the total instantaneous risk its use brings to an operation, and the whole organization upon its failure. A 16 x 13 risk matrix calibrated to the organization’s risk profile, like the one shown in the image below, is used to display risk criticality in the Plant Wellness Way (PWW) EAM methodology.


PWWEAM methodology requires proof of sound, correct decision making, and as much as possible, you visually display your logic processing to make it clear how you made your choices. Showing a part’s risk on a risk matrix to display its criticality and help decide if it ought to be a spare part put into inventory, as in the image above, satisfies that requirement.

A risk matrix is derived from the standard risk equation:

Risk ($ losses per year) = Consequence $ (total losses per event) x Likelihood (events per year)

On a risk matrix, consequence is shown as financial value across the table, and likelihood is shown down the table as decreasing frequency of a possible event. Both axes are log to the base ten scale.

A part’s Criticality 1 value is the total consequential cost and losses of its failure. On the risk matrix the likelihood is ‘Certain’ and once the TDAF Cost is determined the part’s Criticality 1 position is located at the intersection of the TDAF Cost Consequence column and the failure event Likelihood row. Those parts that that sit in the yellow, amber, or red zones of the matrix cause too much risk. The components that bring Extreme risk are the first to be investigated as to whether they are truly critical parts requiring to be in the critical spares inventory.

Identify a Part’s Failure Prevention Strategies from Deformation and Degradation Stressors

This engineering, operating, and maintenance strategies assessment step is done by doing Physics of Failure Factor Analysis (POFFA) on each part in the bill-of-materials for an equipment item. POFFA uses a series of templates to identify events that cause component microstructure stresses.

Identify All the Causes of Each Stressor

Included in the POFFA is the means to identify the full range of causes of microstructure stresses.

List All the Events that Cause the Causes of Stressors

This is a key step of a POFFA and ensures you recognise when stress producing events can occur during the life cycle and service life of a part.

Identify Those Causes that Will Change with Time and/or Service Life

List components that can degrade with time, from abuse, with usage, or due to the local environment being in contact with a part. It is also necessary to reconsider and reassess the risk and risk elimination strategies as parts operate across their service lifetime.

Develop Strategies and Actions to Eliminate Each Cause of the Causes of Stress

A Reliability Growth Cause Analysis (RGCA) template is used to investigate each cause-of-the-cause stress event and how it can be eliminated throughout the component’s life cycle. If elimination is not possible, then effective mitigations to minimise the likelihood of stress events are specified.

Specify the Proof that Confirms the Elimination Strategies and Actions were Done Correctly

RGCA requires you to define the work quality standard for the tasks done to eliminate or mitigate the causes-of-the-causes of a stress.  By meeting the task quality standard set, you are sure to prevent the cause, or reduce the likelihood of the cause happening.

Install and Fully Implement the Elimination Strategies and Actions in the Organisation

The output from the RGCA is a set of specific strategies and actions with work quality controls to be done at relevant points in a part’s lifecycle. They may be capital project tasks, manufacturing tasks, operational tasks, maintenance tasks, installation tasks, and at other life cycle points too.

The tasks and their work quality specifications are written into all the applicable standard operating procedures used during the component’s life cycle. Those people doing the procedures are trained in the correct ways to do the tasks rightly and ensure causes that will overstress or damage a part are not present to harm the microstructure and start future failure events.

Determine the Risk Remaining Once the Strategies and Actions are In Use

The purpose of introducing failure cause elimination and mitigation tasks identified via the POFFA and RGCA is to provide certainty that failures of parts cannot occur, or their risk of failure is greatly reduced to the acceptable zone. Once an equipment’s life cycle wellness strategy and component health care tasks are set for its critical parts, we then review whether too much risk remains. If remaining risk is unacceptable, it is necessary to carry a critical spare part in inventory.

Carry Those Spares with Unacceptable Risk or Carry Insurance Spares for Their Failure

The reduction in likelihood of component failure from using the POFFA and RGCA specified tasks is estimated and mapped onto the risk matrix, as shown in the risk matrix image above. If the new location of a part’s criticality on the risk matrix is in the acceptable zone, then it is not considered a critical part to have in inventory. The one proviso being that the part can be ordered, delivered, and fitted within the time between the detection of emergent failure and its total failure. If procurement, delivery, and repair take longer than the time between component failure detection and breakage, then it is reclassified as a critical part to keep in spare parts inventory.

Parts in the risk red zone are critical and a critical spare part would be kept in maintenance spares or insurance spares inventory. As noted above, the one case when you would not need to have a critical spare is when the part can be ordered, delivered, and installed within the time between the detection of incipient failure and its total failure.

Parts in the green, yellow, and amber risk zones require further assessment as to use of more effective engineering, operating, and maintenance strategies, or being added to the maintenance spares inventory. The spare parts selection decision tree shown in the image below is useful to highlight those parts that need to be reviewed for inclusion into a critical spares inventory.


The above summary explanation is the process to follow in PWWEAM asset life cycle management methodology for identifying spare parts criticality and whether you need to have a critical spare in inventory.

Mike Sondalini
PWWEAM System-of-Reliability
9 June 2022