Log in Article Discussion Edit History Go to the site toolbox

HTM ComDoc 1.

From HTMcommunityDB.org

Contents

This page has been updated. Go to HTM ComDoc 1

Key concepts and terminology

(This document was last revised on ...) Now replaced with HTM ComDoc 1

End of revised material ................................................................................................................................................................................................................................................

......................................................................................................................................................................................................................................................

1.1 What is maintenance?


There are several adequate dictionary definitions of maintenance but in the context of maintaining equipment, it is best defined as the process of keeping the equipment in good condition, in proper working order and completely safe to use. The definition used in the highly respected RCM approach to equipment maintenance is “keeping the equipment available for use”.

A traditional equipment maintenance program has three parts:

  1. Corrective maintenance or, as it is more commonly called, repair, which is the process of returning a device that is in a failed state (i.e. that is no longer doing what the user wants it to do) to a safe condition and proper working order. This includes correcting any significant hidden failures even though they do not usually disable the primary functions of the device.
  2. Cosmetic repair, which is the process of restoring a device that is damaged to a safe and cosmetically like-new condition. While cosmetic repairs are generally considered a lower priority because the device may still be functioning within the manufacturer’s functional specifications it may be damaged in such a way that it is unsafe. For example, a damaged cover may be presenting a sharp edge that could be hazardous to either the patient or to a user.
  3. Preventive maintenance (PM). This third component is very important because from the very beginning, with the earliest machines developed during the time of the industrial revolution, it was widely believed that restoring the device's non-durable parts, as needed, before the end of the device's anticipated lifetime would be beneficial because it would reduce the number of unexpected machine breakdowns. In return for these scheduled PM interventions to restore the device's non-durable parts, the device users expect a lower level of the disruption and loss of productivity associated with in-use breakdowns as well as some reduction in overall maintenance costs.

Non-durable parts (NDPs)- also sometimes loosely called disposable parts - are components of the device that are subject to progressive wear or deterioration. They typically include moving parts,such as bearings, drive belts, pulleys, mechanical fasteners and cables, which require periodic cleaning and lubrication as well as certain non-moving parts such as gaskets, various kinds of filters, flexible tubing and electrical batteries which may need to be cleaned, adjusted, refurbished or replaced sometime during the useful lifetime of the device. Which particular parts the device manufacturer considers to be NDPs will be identified in the manufacturer's recommended PM procedure.

Belief in this traditional device restoration approach to improving machine reliability continues to this day, particularly in certain relatively small industry sectors, even though the findings that started the revolutionary RCM approach to maintenance in the 1970s have caused a considerable amount of rethinking about whether or not intrusive maintenance interventions really do improve the device's overall reliability. Certainly there are still quite a number of medical devices such as ventilators, traction machines and spirometers that are more mechanical than electronic, where the manufacturers still recommend that certain parts be given some kind of periodic rejuvenation (cleaning, restoration or replacement). However, we don’t yet have good, independent evidence as to whether or not these manufacturer-recommended PMs, particularly those involving the more intrusive overhauls, are truly beneficial or cost-effective. We have not yet gathered the data on the impact of these recommended interventions on the reliability of these more mechanical devices. That investigation is one of the goals that the Task Force has set for itself. A device that has a component that the manufacturer has stated needs to be periodically restored in some way and which has the potential to cause harm when it stops working completely while it is in use is called a device restoration-critical device. We discuss the data gathering issue in more detail in HTM ComDoc 4.

1.2 PM in the context of medical equipment maintenance


In the special case of maintaining medical equipment, there is a second very important reason besides device restoration for making periodic scheduled interventions. And that is testing to detect critical deteriorations in the functional performance of the device or in its condition with respect to safety. These deteriorations can be quite subtle, and in RCM jargon they are called hidden failures. The term is appropriate because these subtle changes do not completely disable the device's primary functions and so they will usually go completely unnoticed by the device users.

It is important to detect these subtle deteriorations (hidden failures) because there are certain kinds of medical devices that can cause a patient injury if their performance becomes significantly substandard or their level of safety falls below the relevant requirements. Elsewhere (see HTM ComDoc 3.) we characterize the types of devices that have a theoretical potential to injure a patient if they deteriorate in this way as safety verification-critical devices and they need to be subjected to periodic safety verification tasks designed to detect any hidden failures that are present. Appropriate safety verification protocols for checking out each particular type of device should be included as a part of the device manufacturer's recommended PM procedure.

Similarly we characterize devices that have a theoretical potential to injure a patient if they simply stop working, as life support-critical device types (also in HTM ComDoc 3.) As the descriptor (life support-critical) implies it is important to minimize the failure rate of these devices. If these devices have manufacturer-designated non-durable parts (NDPs) they need to be subjected to appropriate device restoration (DR) procedures to prevent the device from failing. This will eliminate one (but only one) source of failures. So a life support-critical device that has manufacturer-designated non-durable parts becomes a device restoration-critical device. The test for this is whether or not the device manufacturer's recommended PM procedure includes any device restoration tasks.

One of the recurring obstacles in our discussions of PM over the years has been the use of a number of imprecise and inconsistent terms. Unfortunately there is still no general consensus. So, in an attempt to establish a standardized and more consistent PM terminology, we are proposing (below) some new terms.

We believe that it would be quite difficult to get the entire population of engineers and technicians practicing in the medical equipment maintenance field to change from using the long-established traditional diminutive “PM”. To accommodate this practical issue we are proposing to introduce another term with the same diminutive. The new term, "planned maintenance" will be used to define the combination of the traditional device restoration tasks (what we have traditionally called “preventive maintenance”) and the performance/ safety-oriented safety testing tasks that are more or less unique to the medical field. In this new formulation we are proposing to use the term “device restoration tasks" as a short label for the restoration of the device's non-durable parts. It is a simple and appropriately descriptive term.

We are suggesting this new terminology in full recognition of the fact that there are a number of other competing terms that have evolved over time. For example the term “scheduled maintenance” has been proposed as an alternative to “preventive maintenance” but it is not a very good fit semantically because it implies that the device restoration tasks are always performed according to some kind of clock; either by conventional timing (e.g. every 6 or 12 months) or by a time-of-use clock (e.g. every 1000 hours of use). There is, however, a more modern practice in which the deteriorating part is restored on a more efficient “just-in-time” basis by monitoring the actual condition of the part. In some cases the monitoring is performed by some kind of sensor but more commonly in the medical equipment sector it is simply done by conducting periodic visual inspections. In the RCM approach this “just-in-time” restoration is called predictive maintenance. And, what we are proposing to call safety verification tasks have been given the collective name “inspections” by ECRI Institute and others. We prefer the more descriptive term “safety verification” tasks.

So, in summary, in the context of medical equipment maintenance, the contraction “PM” should be understood to mean “planned maintenance” which is defined as a combination of the two elements just described above; i.e.

Planned maintenance (PM) = Device restoration (DR) + Safety verification (SV)

1.3 What causes medical devices to fail?


There are a number of reasons (causes) why equipment systems fail and it is particularly important to recognize that not all of these failures can be pre-empted by some kind of preventive maintenance. Consider, for example, the following list of possible causes of device failure:

  • The first set of causes can be classified as inherent reliability-related failures (IRFs) that are attributable to the design and construction of the device itself, including the inherent reliability of the components used in the device. They typically represent 45 - 55% of the repair calls. This type of failure can be reduced (but not to zero) only by redesigning the device or changing the way it was constructed.

Category IR1 Random failure. A device failure caused by the random failure or malfunction of a component part of the device.. A result of the device’s inherent unreliability. IR1 calls typically represent between 46-52% of all repair calls.

Category IR2 Poor construction. A device failure attributable to poor fabrication or assembly of the device itself..

Category IR3 Poor design. A device failure attributable to poor design of the hardware or processes required to operate the device..


  • The second set of causes can be classified as process-related failures (PRFs). They typically represent 40 - 50% of the repair calls. Reducing or eliminating these types of failure typically requires some kind of redesign of the system’s processes - for example, by using better methods to train the equipment users to operate the equipment (as intended by the manufacturer) or to train them to treat the equipment more carefully. They are not failures that can prevented by any kind of maintenance activities.

Category PR1 Use error. A device failure attributable to incorrect set-up or operation of the device by the user.. User has not set the device up correctly or does not know how to operate it. Typically PR1 calls represent between 13-20% of all repair calls. (Note that although this type of “failure” does not represent a complete loss of function, it can have the same effect. For example, an incorrectly set defibrillator can result in a failure to resuscitate the patient).

Category PR2 Physical damage. A device failure caused by subjecting the device to physical stress outside its design tolerances.. PR2 calls typically represent between 6-25% of all repair calls.

Category PR3 Discharged battery. A device failure attributable to a failure to recharge a rechargeable battery. PR3 calls typically represent between 7-8% of all repair calls.

Category PR4 Accessory problem. A device failure caused by the use of a wrong or defective accessory.. PR4 calls typically represent between 3-9% of all repair calls.

Category PR5 Environmental stress. A device failure caused by exposing the device to environmental stress outside its design tolerances.. PR5 calls typically represent between 1-7% of all repair calls.

Category PR6 Tampering). A device failure caused by human interference with an internal control.. PR6 calls typically represent <1% of all calls.

Category PR7 Network problem. A device system failure caused by an issue within a data network connected to the device’s output.


  • The third set of causes can be classified as maintenance-related failures (MRFs). They typically represent 2 - 4% of the repair calls. These types of failure can be prevented through some kind of maintenance strategy incorporated into the facility’s maintenance program.

Category MR1 PM-preventable failure. A device failure that could have been prevented by more timely restoration or replacement of a manufacturer-designated non-durable part. E.g. a battery failure, a clogged filter, or build up of dust. Failures due to trapped cables should not be coded this way. MR1 calls typically represent between 1-3% of all repair calls.

Category MR2 Poor set up. A device failure caused by poor or incomplete initial installation or set-up of the device.. MR2 calls typically represent between 1-3% of all repair calls.

Category MR3 Needed recalibration. A device failure attributable to improper periodic calibration. MR3 calls typically represent <1% of all repair calls.

Category MR4 Re-repair. A device failure attributable to a poor quality previous repair of the device.. MR4 calls typically represent <1% of all repair calls.

Category MR5 Intrusive PM. A device failure attributable to earlier intrusive maintenance.. MR5 calls typically represent much <1% of all repair calls.


While the device’s overall reliability, which corresponds directly to the total number of the repair calls - irrespective of what caused them – determines the device's effective reliability, it is the numbers of maintenance-related failures (MRFs) and inherent reliability-related failures (IRFs) that are of greatest interest to us, as maintainers, at this time. The level of MRFs provides a good measure of the effectiveness of the facility’s maintenance program, and the level of IRFs provides an equally good measure of the basic or inherent reliability of the devices in question.

1.3 PM in the context of equipment safety

(See also HTM ComDoc 8. Maximizing equipment safety)


There are several ways in which medical equipment can become hazardous.

  • By developing some kind of overt, direct hazard.

For example, the metalwork of a piece of equipment might be damaged as a result of the item being dropped resulting in the possibility of the damaged metal posing a risk of injury to the patient or user, even though the item still works. Or the protective outer layer of the device's electrical cord might be damaged in such a way that it exposes a live conductor posing the risk of an electric shock. These could be hazards to the patient, the device user and others.

  • As the result of a sudden, total failure.
  • As the result of some kind of hidden failure.
  • As the result of being used improperly.

1.4 A new approach to PM prioritization using RCM-based risk criteria.

The logic of this risk assessment method, which we are calling PM prioritization, can be summed up as follows. There are two ways in which a PM-related failure of a medical device can put the safety of a patient or device user at risk:

  • There are some (life-supporting) devices, on which the the patient's life may be totally dependent, which can stop working completely if they are not given some kind of periodic restoration during periodic planned maintenance activities; and
  • There are some devices that can deteriorate in such a way that their performance or level of safety falls to such a degree that the device is potentially hazardous to the patient or user (these are called hidden failures because this deterioration is often not obvious to the user). These hazards are detected and corrected during periodic planned maintenance.

The five questions (immediately below) become, in effect, the criteria on which the entire risk assessment contained in the PM Criticality/ priority Questionnaire (in Section 1.4 below) is based.

  1. Can this type of device cause some kind of adverse outcome if it stops working completely while it is in use? (Since this failure is completely obvious to an observer it is called an evident or overt failure).
  2. Does this type of device have any components that must be restored periodically in order to prevent it from stopping working completely?
  3. Can this type of device cause some kind of adverse outcome if it develops a hidden failure (i.e. one that will not be obvious to the user but is detected when it fails a performance or safety test during a PM)?
  4. What is the projected worst-case level of severity (LOS) of the adverse outcome of a PM-related failure of this device?
  5. What is the level of PM-related reliability of this device?

To maximize patient safety it is important to ensure that all devices whose failure can put the safety of the patient at risk receive appropriate attention. Restoring or replacing a device’s non-durable parts in a timely manner (using what we call device restoration or DR tasks) will reduce the device’s overall failure rate to some degree (but certainly not to zero). And periodic safety verification or SV tasks will uncover any potentially hazardous hidden failures, hopefully before they can cause a patient injury.

Based on certain combinations of these five risk criteria we are proposing a new approach to determining which medical devices are most likely to be potentially hazardous if they are not given periodic attention. These are the devices that should be given an appropriate level of priority for periodic planned maintenance. The term RCM-based risk criteria is appropriate because the logical basis for this questionnaire is the same logical basis as that embedded in the RCM approach.( See HTM ComRef 1. and HTM ComRef 26.)

1.4 PM Criticality/ priority Questionnaire.

For each type of device examined, we use the following set of questions to assess the device's potential to cause a patient injury if it is not subjected to periodic planned maintenance.

  1. Is it reasonably possible that there could be some kind of adverse patient outcome if this device, without reasonable warning, stops working while being used on a patient?
    Response: yes or no
  2. Is it reasonably possible that the device will stop working if one (or more) of the device restoration tasks included in the manufacturer's PM procedure (or the corresponding HTMC procedure is not completed in a timely manner?
    Response: yes or no
  3. If the responses to questions 1 and 2 are both "yes", briefly describe how the evident failure creates a potentially adverse outcome.
    Response: see examples in column 6 of Table 2.
  4. Identify possible mitigating factors that might reduce the severity of the expected outcome.
    Response: see examples in column 8 of Table 2.
  5. After considering the possible mitigating factors listed in response to question 4 above, project the worst-case Level of Severity (LOS) of the outcome of the failure.
    Response: LOS 1, LOS 2, or LOS 3
  6. What is the likelihood (projected or demonstrated) for this type of device that this type of evident failure will occur?
    Response: Quite likely; Not very likely; Very unlikely
  7. If the manufacturer's recommended PM procedure (or the corresponding HTMC PM procedure) includes any functional performance or safety tests, is it reasonably possible that some kind of adverse patient outcome could result if the device falls out of spec and fails one or more of these tests?
    Response: yes or no
  8. If the response to question 6 is "yes" briefly describe the nature of the worst-case outcome.
    Response: see examples in column 5 of Table 3.
  9. Identify possible mitigating factors that might reduce the severity of the expected outcome.
    Response: see examples in column 7 of Table 3.
  10. On the same scale of LOS 1 to 3 described in question 5, project the worst-case Level of Severity of the anticipated adverse outcome.
    Response: LOS 1, LOS 2, or LOS 3
  11. What is the likelihood (projected or demonstrated) for this type of device that this type of hidden failure will occur?
    Response: Quite likely; Not very likely; Very unlikely

It is important to point out here that all possible hidden failures are not listed in column 5 of Table 3.. In many cases there may be a number of possible hidden failures and the best way of identifying them is to review the test protocols listed in the performance verification and safety testing (PVST) section of the device's generic PM procedure. For example, by looking at the SV section of the generic PM procedure for a defibrillator-monitor (click on the PM Code in the 3rd column of Table 3 - DEF-01 you can see that Tasks (S7 thru S10) have been labelled as "Serious failure is potentially Life-threatening". The example cited in the fifth column of Table 3. is that the ""hidden failure caused the unit to under-deliver"" which would correspond to a PM finding that Task S10 indicated that the delivered energy was significantly less than what the energy level selected. According to the extent to which the device is found to be out-of-spec (OOS) the adverse outcome should be judged to be of either Level 1, Level 2, or Level 3 severity. In both of these cases (an anticipated overt failure or a hidden failure) the analyses in the tables should include this additional judgment on the outcome and worst case level of severity of each anticipated failure, entered in the sixth or seventh column of the respective table.

Non critical devices. If the analysis results in Response 6 being "no", and one or both of Responses 1 and 2 being "no", then this type of device should be classified as "not potentially PM-critical", or more simply as a "non critical device".

Potential PM Priority 1 devices. If the analysis results in any other combination of responses then this type of device should be classified as "potentially PM Priority 1". More precisely it should be considered to be PM-critical at a PM Failure Severity level determined by compounding the potential adverse outcome levels reported in Table 2. and Table 3. See columns 3, 4 and 5 in Table 4.


Table 2. and Table 3. illustrate how this concise risk characterization process works. We have used these three risk criteria to filter and categorize a subset of the 75 more complex device types (see Table 1. ) that we believe represent all of the device types that are likely to meet either of the Task Force's first three RCM-based risk criteria. Although this particular subset represents only about 5-10% of the 700 to 1500 different types of medical equipment in modern hospitals, we believe that it represents all of the types of device that are likely to injure a patient either if they stop working completely or if they develop some kind of significant hidden degradation.

The concise scenarios described in the fifth and sixth columns of Table 2. and Table 3. make the categorization process logical and quite transparent since the judgements are there to be challenged. This should allow much better consistency than the broad, potentially subjective generalizations of earlier methods. The new method introduces one or two new terms to characterize more precisely the nature of the device types that should be considered potentially hazardous, but these new terms are helpful in identifying which preventive strategies, including non-maintenance measures, will work best for maximizing patient safety (see HTM ComDoc 8. )

The Task Force has prepared a brief statement documenting the reasons why the PM Criticality Questionaire is consistent with established industry standards of practice.

1.5 Possible adverse outcomes from medical device failures


There is a wide range of possible adverse outcomes from device failures. Some create a risk of physical harm to the patient (or to the device user). Others can result in additional direct or indirect costs to the facility and thus create an economic or business risk to the organization. We will address these economic/business risks in greater detail in HTM ComDoc 9.

In the case of outcomes creating a physical risk or economic harm it helps our analysis to define a hierarchy of three levels of severity (LOS) of possible outcomes.

Outcomes creating a physical risk

  • LOS 3. Serious, life-threatening injury - The patient (or the user) may lose his or her life.
  • LOS 2. Less serious, non life-threatening injury - The patient (or the user) may sustain a direct or indirect injury ranging from minor to serious.
  • LOS 1. No injury, but possible disruption of care - The incident may cause a temporary disruption of care, such as requiring one or more patients to be rescheduled, delaying treatment or delaying the acquisition of diagnostic information.


Outcomes creating possible economic harm

  • Loss of use - Downtime from higher failure rates may reduce device-related revenue and increase rental expenses for replacement equipment.
  • Increased repair costs - Higher failure rates and more complex failures may increase the facility's repair costs.
  • Loss of reputation - Higher than expected failure rates of mission-critical devices may damage the facility’s reputation and reduce support from the local community.

1.4 Maintenance-related safety: the logical foundation for a new PM Risk Analysis


Certain kinds of medical equipment can be hazardous to the patient, and sometimes to the user, if:

  • They are damaged in such a way that the device is presenting some kind physical threat to the safety of patients or staff, such an exposed sharp edge,
  • They have one or more non-durable parts that were not restored or replaced at the right time, or
  • They have undetected (hidden) performance or safely deficiencies.

So, from a maintenance perspective, three things must be done to provide the highest level of assurance that the facility's medical equipment is as safe as possible.

  1. Damaged equipment that is physically hazardous must be repaired and made safe as soon as possible.
  2. Devices that are device restoration-critical (those that have one or more non-durable parts that need to be refurbished or replaced sometime during the working lifetime of the device) need to have the appropriate device restoration tasks performed at an interval no longer than that recommended by the device manufacturer - particularly if the device has the potential to cause an injury at severity levels LOS 2 or 3 if it stops working.
  3. Devices that are safety verification-critical (those that have one or more hidden failures that have the potential to cause harm at severity levels LOS 2 or 3 if the degradation is not detected and corrected) need to be subjected to the periodic testing specified in the manufacturer's recommended PM procedure to reveal these functional performance or safety deficiencies so they can be repaired promptly. These defective devices need to have the appropriate safety testing tasks performed at intervals no longer than that recommended by the device manufacturer.

A PM-critical device is the Task Force's name for a device that is either device restoration-critical or safety testing-critical, or both.

PM-critical devices have one or more device restoration tasks and/or one or more safety testing tasks included in the manufacturer's recommended PM procedure that also have a theoretical potential to cause some kind of patient harm if they fail, either completely, or by developing some kind of hidden failure.

There are two kinds of PM-critical devices

  1. Devices that are device restoration-critical (Table 2.)
  2. Devices that are safety verification-critical (Table 3.)

If there is a plausible scenario that not completing one of the manufacturer-recommended device restoration tasks in a timely manner could result in an serious adverse patient outcome then that task is labelled as a critical device restoration task, and that scenario is considered to represent a critical device restoration-related failure mode. For example, if a critical care (life-supporting) ventilator suddenly stops working because of the premature failure of a non-durable part it is possible that the patient will be deprived of oxygen for an extended period while the device is down. Critical care ventilators usually have one or more device restoration tasks in their PM procedures that could cause the device to stop working if the device restoration is not done at the right time.

Table 2. is a catalog of the potential outcomes of possible critical device restoration failure modes for a number of potentially PM-critical devices. In the worst case the potential adverse outcome of such failures could be at severity level LOS 3 (serious, life-threatening injury). The implicated device restoration tasks in the corresponding HTMC PM procedure are labeled as critical or potentially critical tasks.

Similarly, if there is a plausible scenario that not completing one of the manufacturer-recommended safety testing tasks in a timely manner could result in an serious adverse patient outcome then that task is labelled as a critical safety testing task, and that scenario is considered to represent a critical safety testing-related failure mode. For example, if the air flow detector in an infant apnea monitor degenerates to the point that it would fail to detect the cessation of the infant breathing the outcome of this hidden failure could well be be at severity level LOS 3 (life threatening). For this reason infant apnea monitors usually have a safety testing task to verify the proper functioning of the flow detector included in their PM procedure.

Table 3. is a catalog of possible outcomes of critical safety testing failure modes for a number of potentially PM-critical devices. In the worst case the potential adverse outcome of such failures could be at severity level LOS 3 (serious, life-threatening injury). The implicated safety testing tasks in the corresponding HTMC PM procedure are labeled as critical or potentially critical tasks.

Non-critical devices

Devices that do not have either critical device restoration failure modes or critical safety testing failure modes are not PM-critical, so the task Force calls them non-critical devices. Because non-critical devices have no PM-critical failure modes (i.e. no critical PM tasks in their manufacturer-recommended PM procedures) they present no threat whatsoever to patient safety if they are not subjected to the recommended PM. This makes them legitimate candidates for the so-called light maintenance strategy (ComRef 26.) which simply allows the device to be used – regulatory constraints permitting - without any kind of periodic maintenance. In some cases an argument might be made for periodic PM interventions on the grounds that they would reduce the net cost of maintaining the device, but - as of this time - we know of no studies that have documented such a finding for any type of medical device. For more details on the possible business and economic impact of planned maintenance. (See HTM ComDoc 9.)

Non critical devices are certainly candidates for the Alternate Equipment Management (AEM) program currently permitted by both the TJC standards (HTM ComRef 27.) and the CMS regulation (HTM ComRef 28.)

We describe how to determine which devices are PM-critical devices and which are non-critical devices in section 3.3 of HTM ComDoc 3.

As best we can estimate there are, in round numbers, between 750 and 1500 different types of healthcare-related devices in use in today’s healthcare facilities. An unknown number of these are non-clinical devices such as printers or other device accessories that do not even fall into the formal category of a medical device that is regulated by the FDA. These non-clinical devices are extremely unlikely to be PM-critical. At the other end of the scale there is a group of about 70 device types that are more likely to be PM-critical, either because of their complexity, or for some other reason that was captured in the original Fennigkoh-Smith criteria.

The Task Force believes that a large percentage of the estimated remaining balance of at least 700 device types will prove to be non-critical when they are analyzed. One example is a set of patient scales. When the HTMC generic PM procedure for a set of patient scales (PA.SC-01)is analyzed using the questionnaire process described in section 3.3 of HTM ComDoc 3., responses (1), (2) and (6) are all “no”, and so - according to our criteria - a set of patient scales should be classified as a non-critical device.

Based on the preliminary findings shown in Table 2. and Table 3. we believe that a large number of device types can be shown to be non-critical. This is a very important step because it provides a very solid, rational argument for why a very large number of medical devices can be used quite safely without any kind of periodic PM whatsoever. They simply have no high-severity, PM-preventable failure modes and so, by definition, they are non-critical. The evidence for this is that there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent the device, if it could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm that had already developed.

This leaves a list of about 70 device types, shown in Table 4., that are potentially PM-critical. However, as we will show in Part 2 of this article, by implementing Step 2 of this new risk analysis, which will draw on aggregated maintenance data from the new community-wide database, we will be able to determine which of these devices should actually be designated as PM-critical (high risk) devices and given periodic PM according to the manufacturer’s recommendations. The others are all more reliable, lower risk devices. We anticipate that, when fully implemented, the analysis in Step 2 will reveal devices with risk levels distributed across the full spectrum from high-risk to very low risk devices.

All PM-critical devices are not necessarily high-risk devices!

Just having one or more critical PM-related failure modes is not sufficient to make a device classifiable as a potentially unsafe "high-risk" device. According to modern reliability and risk management theory (HTM ComRef 1., HTM ComRef 2.), "risk" has two components:

  1. The severity of the outcome of the event (in this context a PM-preventable device failure); and
  2. The likelihood that the event (the PM-preventable device failure) will actually occur.

This required combination of two factors means that devices that have a manufacturer-recommended PM procedure with critical device restoration tasks or safety testing tasks will not necessarily become hazardous just because the manufacturer's procedure is not followed or even utilized at all. If the likelihood of any PM-related failures actually occurring (even if they are critical failures with high-severity outcomes) is very low - with a mean time between failures (MTBFs) of, say, 50-75 years or more - then the corresponding risk of harming the patient is reduced from high to moderate, to low, or even to very low. The actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation. This is why traveling on a commercial airliner is considered to be safe. While there is a theoretical potential for a high-severity outcome if the plane should crash, the likelihood that this will actually happen is very low – so the level of risk when flying on a commercial airliner is also very low, relative to other ways of traveling.

In order to determine which devices have the theoretical potential to cause a patient injury (or some less severe adverse outcome) if the device should fail because its PM was not completed in a timely manner - we first need to be clear about what is achieved by performing the various tasks listed in the manufacturer’s recommended PM procedure.

In general, there are two kinds tasks contained in a medical device’s PM procedure. The first kind is a task that restores the device to something close to its original, like-new condition. The Maintenance Practices Task Force calls these device restoration tasks. They are tasks in which components that are subject to deterioration during the useful lifetime of the device, such as batteries, cables, fasteners, gaskets and tubing, are periodically refurbished or replaced. The second kind is some sort of test to detect any hidden degradations in the functional performance or safety of the device that are sufficiently hazardous to require immediate correction. The Task Force calls these safety testing tasks.

It is entirely possible for some manufacturer-model versions of any of the PM-critical device types listed in Table 2. and Table 3. to be classified as low-risk devices if they can be shown to have good reliability (a demonstrated low probability of failing). Table 12. shows the Task Force's tentative definitions of what should be considered acceptable levels of reliability. We will discuss this in more detail in section 3.3 of HTM ComDoc 4.

(There are a very large number of medical devices that can be used quite safely without any kind of periodic PM whatsoever because they have no high or moderate-severity, PM-preventable failure modes. These devices are, by definition, non-critical. The evidence for this is that either there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent a device that could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm if it had already developed; or there are no possible high or moderate severity outcomes from either total failure or serious degradations.)

(Device types that have no potential whatsoever to cause any kind of patient injury or any other significant adverse outcome when they fail, either completely, or by developing a hidden failure - such as a phototherapy light - will be classified as non-critical. And since, by definition, non-critical devices have no significant adverse outcome if they fail, they will all be automatically categorized as inherently safe devices.)

In summary, all non-critical device types (i.e. those that have no critical PM-related failure modes) are, by definition, inherently safe with respect to needing PM. Whereas, all PM-critical device types are potentially high-risk (potentially hazardous) devices unless certain manufacturer-model versions of those device types can be shown to have good reliability (i.e. a low likelihood that the PM-related failures will actually occur), in which case they can be categorized as lower risk devices. See Table 12. for a more details on the tentative definitions of the various levels of device risk.

We will describe how to determine which devices are PM-critical/high risk devices in section 4.x of HTM ComDoc 4.

So, if the total failure or critical degradation of the device is highly unlikely to occur, the level of risk associated with using the device is correspondingly small. Devices that are classified in the tables as having potentially life-threatening severity (LOS 3) outcomes from total failure or from critical degradation should more properly be called potentially hazardous or potentially high-risk devices because the actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation.

1.5 Measuring PM-related device reliability


A device or equipment system is considered to have failed when (a) it no longer performs the function or functions that the user wants it to perform, or (b) if it functions as it should – but in an unsafe manner. It is a truism, similar to the impossibility embedded in the concept of perpetual motion, that there can be no such a thing as an infallible device - one that cannot fail. All devices fail in some way or other, at some time or other. So there can be no absolutes in a scale of reliability, only relative measures. The simplest measure of a device’s reliability is its failure rate. However, a more intuitive way of expressing reliability is the inverse of the failure rate which is the device’s mean time between failures or MTBF.

It is generally easier for lay persons to relate to an MTBF because it is in the form of a period of time - a simple, easily comprehended variable. Most people will have little difficulty in considering a device with an MTBF of just one month to have a relatively poor level of reliability and, conversely, considering a device with an MTBF of 50 years to be quite reliable. Since, ideally, we would like to separate various different kinds of devices into neat compartmentalized categories such “safe” and “hazardous” we have to confront the difficulty of setting boundaries and consequent gray areas around those boundaries. For example, setting a threshold of, say, 75 years for the MTBF that should be considered safe creates the hard-to-answer question of how much less reliable (and therefore potentially less safe) is a device with an MTBF of 74 years than one with an MTBF of 75 years? There is, of course, no simple black and white answer to that question. It is all relative.

This discussion is made a little more complicated by the fact that there are a number of different reasons why devices fail, and lumping all of these failures for these different reasons into one overall failure rate, or corresponding MTPF, might well raise the question that this total failure rate does not seem to fairly describe what we think of as either the reliability of the device itself, or the effectiveness of the way we maintain it. The next section addresses the nature of these different kinds of causes of failure and how they can be categorized and used to develop a helpful and meaningful analysis.

1.6 What causes equipment systems to fail?


1.7 Hidden failures


There is one other kind of failure, in addition to a Category MR1 type of failure, that can be prevented by periodic, planned maintenance. In RCM terminology this kind of failure is called a hidden failure.

A hidden failure (HF) is said to have occurred when either:

  • the device delivers an output that is significantly out of specification, but sufficiently similar to the output that the user wants, that the failure is not immediately obvious to the user, or
  • the device is no longer in compliance with the relevant safety specifications for the device in question, but this deterioration is also not obvious to the user.

When this more subtle type of failure introduces a significant performance or safety degradation that can be detected only by some kind of performance verification or safety test it can constitute a serious safety threat. For example, a heart rate alarm that has malfunctioned so that it no longer goes off at the set limit will remain as a hidden but potentially hazardous failure until the alarm function is checked and the potentially dangerous degradation discovered. The potential seriousness (i.e. level of severity) of hidden failures will depend on the nature of the failure and on how far the performance or safety flaw is out of specification. For example; a significant reduction in the output of a defibrillator has to be considered life-threatening but a small excess in the electrical leakage current of a laboratory centrifuge – while it should be noted in the test report - is unlikely to constitute a significant threat, or be considered an imminent threat.

Hidden failures are discovered when the performance verification and safety testing tasks are performed during the PM. When they are found they should be described in a note on the PM work order or the PM report and it would be helpful if the description of the findings provided enough information to enable a judgment to be made as to the worst case potential level of severity (LOS 1, LOS 2, or LOS 3) of the adverse outcome that would have resulted if the hidden failure had not been discovered.

A particularly important type of hidden failure is one that disables the proper operation of an automatic protection mechanism (APM) - see section 1.9 below - that is included as a component of the device. An APM is usually included in the design to provide protection against another possible hidden failure that is itself considered to be capable of a serious or potentially life-threatening adverse consequence.

1.8 PM-related reliability and PM-related device safety


A device's PM-related reliability is determined by the most frequently encountered of the two kinds of failure that are PM-preventable; (a) premature deteriorations of one or more of the device's non-durable (consumable) parts and (b) hidden performance deterioration or safety deterioration types of failure.

The device's PM-related reliability is the lesser (the one representing the lower level of reliability) of the following two MTBFs:

  • The MTBF based on the total of 1) any overt MR1 failures caused by inadequate device restoration (from the repair cause coding) and 2) any PM Code 9 findings (which are immediate precursors of the overt MR1 failures caused by inadequate restoration).
  • The MTBF based on the total of any hidden performance and safety degradations detected by the safety verification tasks (PM Code F findings)

A device's PM-related level of safety and priority for PM is determined by combining the projected worst case severity of the outcome of a PM-related failure and the projected or demonstrated likelihood that the failure will actually occur. The Task Force has defined the following five levels for a device's priority for timely PM. (See Table 12.)


  • PM Priority 1. Devices with poor PM-related reliability (quite likely to have a PM-related failure) that could result in a serious, life-threatening injury (LOS 3)


  • PM Priority 2. Devices with good PM-related reliability (not very likely to have a PM-related failure) that could result in a serious, life-threatening injury (LOS 3) , and
devices with poor PM-related reliability (quite likely to have a PM-related failure) that could result in a less serious, non life-threatening injury (LOS 2)


  • PM Priority 3. Devices with very good PM-related reliability (very unlikely to have a PM-related failure) that could result in a serious, life-threatening injury (LOS 3), and
devices with good PM-related reliability(not very likely to have a PM-related failure) that could result in less serious, non life-threatening injury (LOS 2), and
devices with poor PM-related reliability (quite likely to have a PM-related failure) that could result in no injury, but a threat to patient care (LOS 1)


  • PM Priority 4. Devices with very good PM-related reliability (very unlikely to have a PM-related failure) that could result in a less serious, non life-threatening injury (LOS 2), and
devices with good PM-related reliability (not very likely to have a PM-related failure) that could result in no injury, but a threat to patient care (LOS 1)


  • PM Priority 5. Devices with very good PM-related reliability (very unlikely to have a PM-related failure) that could result in no injury, but a threat to patient care (LOS 1)

1.9 Reassuring equipment users about the level of patient safety associated with their equipment and what we are doing to ensure that it is maintained at an acceptable level

This material was originally published as an article titled "Final Word: Doing it by the numbers" HTM ComRef 15.. This is a slightly modified version of the original document.

"When our clinical colleagues ask us: “How do I know that all of my equipment is safe to use on my patients”, and we respond by saying; “Well, all of our PMs are always done on time”; wouldn’t it be more reassuring if we could give them a response more like this:

Well, there are some pieces of equipment like the knee exerciser over there that are not very complicated and which couldn’t possibly injure a patient, even if it failed completely. These kinds of items are classified as non-critical and they only need very simple maintenance to keep them functional. A lot of the equipment on the hospital’s inventory falls into this non-critical category.
There are some other items, such as this critical care ventilator and this transport incubator that have non-durable parts that need periodic restoration as well as being critical in the sense that if they fail completely a patient could be injured. These kinds of devices are classified as PM Priority 1 and we take a number of steps to make sure that the possibility of them failing completely is very rare. For devices in this category we follow all of the manufacturer’s recommendations for periodic preventive maintenance and we carefully analyze every instance when any of these types of device fail. These kinds of device are specially designed to be very reliable. We also monitor the statistics from our national database to confirm that each specific make and model of the devices that we have classified as PM Priority 1 is demonstrating an average mean time between failures (MTBF) of at least several hundred years.
There are some other types of devices such as this apnea monitoring system and this infant incubator which can, in theory, cause some kind of patient injury if they fail in such a way that they are either misleading the clinical staff by providing incorrect or substantially inaccurate information, or in such a way that they are no longer meeting the relevant safety specifications. All of the different ways that these kinds of devices (which we also classify as PM Priority 1 devices) could fail in this way have been identified as potentially critical failures. These kinds of failures are called hidden failures because, usually, they can be discovered only by periodically performing some kind of performance verification or safety testing. These kinds of device are also designed to be very reliable and, oftentimes, to warn the operator of the onset of any hidden failures. We have been collecting and aggregating the results of these performance and safety tests in our nationwide database and, again, we have confirmed that each specific make and model that we have of these PM Priority 1 devices develop potentially critical failures only very, very infrequently. Our database shows that these kinds of hidden failures occur, on the average, no more frequently than once every 500 years.”

The statement above is, of course, meant only to illustrate the principle. The underlined statements will need to be customized to each specific facility and adjusted to reflect the levels of safety that can be substantiated by actual test data.

1.12 Progress Report (November 2017)

To quote from the third paragraph of the statement titled “Background” on the Introductory Materials page of the website created by the Maintenance Practices Task Force (MPTF), one of the primary motivations prompting this project, which AAMI began supporting in November 2015, is to address the huge problem created by the failure of the Healthcare Technology Management (HTM) community to establish a “… generally-agreed way of quantifying current levels of maintenance-related medical equipment safety …”.

Much has been written about medical technology, and virtually all of it states that the ultimate, overriding consideration must always be assuring the very highest levels of patient safety. Maximizing patient safety is, of course, a very worthy goal - with which there can be no quarrel - but to paraphrase one of the better maxims of the business world – if you can’t measure it, you can’t manage it. And since virtually all of the regulations and standards governing the HTM business include a requirement, either direct or indirect, to provide levels of patient safety that are “generally acceptable”, this current lack of an accepted metric for medical device safety – and maintenance-related medical equipment safety in particular - makes it impossible to prove how well (or not) we are satisfying this important obligation. This same lack of the proper tools also makes it very difficult to compare the levels of maintenance-related medical equipment safety achieved by different maintenance strategies.

A current manifestation of this quandary is the requirement in the recently amended medical equipment maintenance regulations of The Centers for Medicare & Medicaid Services (CMS) which implies very strongly that the use of any of the now-permitted alternate equipment management (AEM) strategies for maintaining the facility’s medical equipment must keep the equipment just as safe as it would be if the devices were being maintained according to the manufacturer’s recommendations. This is clearly a very reasonable requirement but it is creating practical difficulties for facilities trying to introduce more cost-effective maintenance practices, as well as for the various survey and inspection teams who are responsible for confirming that maintenance practices other than those recommended by the device manufacturer are not exposing patients to higher levels of risk.

Everyone familiar with the standard texts on risk management knows that safety itself is not directly measurable (see, for example, the third chapter in “ Of Acceptable Risk: Science and the Determination of Safety “ by William Lowrance). The only aspect of safety that is measurable is the actual level of risk created by some specified potential hazard. So when we say that something such as a medical device is safe, what we are really doing is making a judgment relative to some recognized standard that the risk created by one or more particular potential hazards (such as, in this case, the potential for an adverse patient outcome attributable to inadequate device maintenance) is generally acceptable. Devices that are deemed “safe” in this way are really only safe with respect to the specifically identified hazard, or hazards.

While all of the various participants in the HTM business - including the regulating authorities - have cited patient safety as the primary driver within their respective areas of responsibility, there has been a lack of meaningful efforts to establish a rational, scientific basis for making these judgment calls on the level of safety of the patient. This is certainly true of the regulatory framework that is intended to ensure the safety of medical devices in their working lifetime, subsequent to the device having passed through the FDA ‘s initial device approval process. It has already been pointed out in the just-published AEM Program Guide that some of the accreditation standards based on the CMS regulation (referenced above) contain sloppily incorrect or inconsistent terminology as well as a complete lack of direction on how conformance to what are allegedly the “generally acceptable” levels of patient risk should be demonstrated.

By adopting the widely used and very well respected scientific methodology embedded in reliability-centered maintenance (RCM), the Maintenance Practices Task Force (name shortened elsewhere in this report to “the Task Force”, “the MPTF” or just “the TF”) has made significant progress towards solving this fundamental problem. As described in HTM ComDoc 1 and several other related documents on the website, the Task Force has created a useful method for characterizing the level of the PM-related risk associated with the different manufacturer-model versions of the most PM-critical medical devices. Each of the identified levels of maintenance-related risk are combinations of two parameters; one representing an assessment of the worst-case level of severity of the adverse outcome of a PM-preventable failure of the device (the TF has selected three representative levels - either a life-threatening injury, a serious but less than life-threatening injury, or a less serious outcome such as a delayed diagnosis or delayed treatment) and a second parameter quantifying the likelihood of a PM-preventable failure actually occurring (represented by the device’s documented PM-related failure rate).

The Task Force has also proposed a practical method for establishing what level of PM-related risk should be considered acceptable – another notable step forward. In this particular context it seems logical to set the standard for acceptable maintenance-related safety at the typical level of PM-related risk achieved when the devices in question are maintained strictly according to the manufacturer’s recommendations. Just what this level is, can and will be determined (see project Objectives # 3 & 4) by conducting a statistically satisfactory number of tests to determine and document the actual PM-related failure rates demonstrated by a sample drawn from a number of the potentially most critical devices during a time when they are being maintained according to their manufacturer’s recommendations.

Patient safety as it relates to the maintenance of medical devices

Much has been written about medical technology and virtually all that is written cites maximizing patient safety as the ultimate, overriding consideration. This is, of course, a very worthy goal with which there can be no quarrel; it is the motherhood and apple pie of healthcare technology management (HTM) and a cherished icon that we all serve dutifully and enthusiastically. In addition to this, virtually all of the regulations and standards governing the HTM business include either a direct or indirect obligation to provide acceptable levels of patient safety. The rub comes however when we attempt to quantify how well our efforts are measuring up to this rather vague obligation to maximize patient safety.

A recent piece by …. on the debate over medical device service urging …. is an good example.

Safety itself is not measurable. The only aspect of safety that is measurable is the actual level of risk created by some specified potential hazard. So when we say something is safe, what we are really doing is making a judgment that the level of risk posed by one particular potential hazard is considered to be acceptable. The device is indeed safe but only with respect to this one particular hazard (cite Lowrance).

To illustrate this we will use an example from recent investigations (cite ?) into alternative equipment management (AEM) strategies that would make medical devices just as safe as they would be if the device were being maintained according to the manufacturer’s recommendations – something now permitted by recent revisions to the regulations of the Centers for Medicare & Medicaid Services (CMS) relating to medical equipment (cite ?). In this example the risk that we are concerned with is the risk that the device will fail from a PM-preventable cause.

PM-preventable failures.

The key to identifying which device failures can be attributed to a PM-preventable cause (could have been prevented by a more effective or more timely PM activity) is to examine each of the tasks listed in the manufacturer’s PM procedure. This will identify which of the device’s components needs some kind of periodic restoration such as a filter that needs cleaning or a battery that needs to be replaced. If a device is presented for repair and the only thing wrong with it can be traced a component that is scheduled for some kind of restoration during PM, then it is quite likely that this failure can be considered to be a PM-preventable failure. Maybe the restoration performed during the last PM was ineffective or maybe the PM interval is too long. Similarly, the manufacturer’s PM procedure may include testing the performance of the device to detect deteriorations in either its functional performance or in its compliance with certain safety requirements that would not be obvious to the user – so-called hidden failures. While these deteriorations have not caused a complete failure the diminished performance could be putting the patient at risk and these should be considered to be PM-preventable failures. A shorter PM interval would have reduced the length of time that the patient was exposed to some level of risk.

In order to gather reliable information on the frequency with which PM-preventable failures are encountered it is very important to standardize the techniques and criteria for diagnosing when a user-reported failure is legitimately attributable to inadequate or tardy PM. Similarly, we need to standardize the techniques and criteria for diagnosing failures encountered when the actual PM is performed. Obviously a PM finding that the device failed one or more of any critical performance or safety tests included in the PM procedure constitutes a PM-preventable failure (It is an indicator that the PM interval is too short). And the Maintenance Practices Task Force has proposed that discovering a part that was scheduled for some kind of restoration during the PM has already deteriorated to the point where it could have been interfering with the proper operation of the device is also considered to be a PM-preventable failure. This is also an indicator that the PM interval is too short.

Unfortunately there is still some considerable variation in the kind of maintenance data collected throughout the field. While there have been recommendations for standardizing on these particular indicators that the failure was PM-preventable, they are not yet in widespread use.


So even though we are often required to characterize something such as a maintenance practice or maintenance “strategy” as safe or unsafe we generally fail to address the judgment call nature of this requirement. Although we champion data driven decisions – and this is an important and laudable step forward - we need to recognize that with respect to safety there are generally no prescribed boundaries separating acceptable (i.e. safe) levels of risk from unacceptable (i.e. unsafe) levels of risk.

The data driving the decisions are the levels of risk relevant to certain specific hazards.


All potentially PM-critical devices are not necessarily high-risk devices!

Just having one or more critical PM-related failure modes is not sufficient to make a device classifiable as a potentially unsafe "high-risk" device. According to modern reliability and risk management theory (HTM ComRef 1., HTM ComRef 2.), "risk" has two components:

  1. The severity of the outcome of the event (in this context a PM-preventable device failure); and
  2. The likelihood that the event (the PM-preventable deavice failure) will actually occur.

This required combination of two factors means that devices that have a manufacturer-recommended PM procedure with critical device restoration tasks or safety testing tasks will not necessarily become hazardous just because the manufacturer's procedure is not followed or even utilized at all. If the likelihood of any PM-related failures actually occurring (even if they are critical failures with high-severity outcomes) is very low - with a mean time between failures (MTBFs) of, say, 50-75 years or more - then the corresponding risk of harming the patient is reduced from high to moderate, to low, or even to very low. The actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation. This is why traveling on a commercial airliner is considered to be safe. While there is a theoretical potential for a high-severity outcome if the plane should crash, the likelihood that this will actually happen is very low – so the level of risk when flying on a commercial airliner is also very low, relative to other ways of traveling.

In order to determine which devices have the theoretical potential to cause a patient injury (or some less severe adverse outcome) if the device should fail because its PM was not completed in a timely manner - we first need to be clear about what is achieved by performing the various tasks listed in the manufacturer’s recommended PM procedure.

In general, there are two kinds tasks contained in a medical device’s PM procedure. The first kind is a task that restores the device to something close to its original, like-new condition. The Maintenance Practices Task Force calls these device restoration tasks. They are tasks in which components that are subject to deterioration during the useful lifetime of the device, such as batteries, cables, fasteners, gaskets and tubing, are periodically refurbished or replaced. The second kind is some sort of test to detect any hidden degradations in the functional performance or safety of the device that are sufficiently hazardous to require immediate correction. The Task Force calls these safety testing tasks.

It is entirely possible for some manufacturer-model versions of any of the PM-critical device types listed in Table 2. and Table 3. to be classified as low-risk devices if they can be shown to have good reliability (a demonstrated low probability of failing). Table 12. shows the Task Force's tentative definitions of what should be considered acceptable levels of reliability. We will discuss this in more detail in section 3.3 of HTM ComDoc 4.

(There are a very large number of medical devices that can be used quite safely without any kind of periodic PM whatsoever because they have no high or moderate-severity, PM-preventable failure modes. These devices are, by definition, non-critical. The evidence for this is that either there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent a device that could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm if it had already developed; or there are no possible high or moderate severity outcomes from either total failure or serious degradations.)

(Device types that have no potential whatsoever to cause any kind of patient injury or any other significant adverse outcome when they fail, either completely, or by developing a hidden failure - such as a phototherapy light - will be classified as non-critical. And since, by definition, non-critical devices have no significant adverse outcome if they fail, they will all be automatically categorized as inherently safe devices.)

In summary, all non-critical device types (i.e. those that have no critical PM-related failure modes) are, by definition, inherently safe with respect to needing PM. Whereas, all PM-critical device types are potentially high-risk (potentially hazardous) devices unless certain manufacturer-model versions of those device types can be shown to have good reliability (i.e. a low likelihood that the PM-related failures will actually occur), in which case they can be categorized as lower risk devices. See Table 12. for a more details on the tentative definitions of the various levels of device risk.

We will describe how to determine which devices are PM-critical/high risk devices in section 4.x of HTM ComDoc 4.

So, if the total failure or critical degradation of the device is highly unlikely to occur, the level of risk associated with using the device is correspondingly small. Devices that are classified in the tables as having potentially life-threatening severity (LOS 3) outcomes from total failure or from critical degradation should more properly be called potentially hazardous or potentially high-risk devices because the actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation.


Non-critical devices

As best we can estimate there are, in round numbers, between 750 and 1500 different types of healthcare-related devices in use in today’s healthcare facilities. An unknown number of these are non-clinical devices such as printers or other device accessories that do not even fall into the formal category of a medical device that is regulated by the FDA. These non-clinical devices are extremely unlikely to be PM-critical. At the other end of the scale there is a group of about 70 device types that are more likely to be PM-critical, either because of their complexity, or for some other reason that was captured in the original Fennigkoh-Smith criteria.

The Task Force believes that a large percentage of the estimated remaining balance of at least 700 device types will prove to be non-critical when they are analyzed. One example is a set of patient scales. When the HTMC generic PM procedure for a set of patient scales (PA.SC-01)is analyzed using the questionnaire process described in section 3.3 of HTM ComDoc 3., responses (1), (2) and (6) are all “no”, and so - according to our criteria - a set of patient scales should be classified as a non-critical device.

Based on the preliminary findings shown in Table 2. and Table 3. we believe that a large number of device types can be shown to be non-critical. This is a very important step because it provides a very solid, rational argument for why a very large number of medical devices can be used quite safely without any kind of periodic PM whatsoever. They simply have no high-severity, PM-preventable failure modes and so, by definition, they are non-critical. The evidence for this is that there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent the device, if it could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm that had already developed.

This leaves a list of about 70 device types, shown in Table 4., that are potentially PM-critical. However, as we will show in Part 2 of this article, by implementing Step 2 of this new risk analysis, which will draw on aggregated maintenance data from the new community-wide database, we will be able to determine which of these devices should actually be designated as PM-critical (high risk) devices and given periodic PM according to the manufacturer’s recommendations. The others are all more reliable, lower risk devices. We anticipate that, when fully implemented, the analysis in Step 2 will reveal devices with risk levels distributed across the full spectrum from high-risk to very low risk devices.

1.8.2 A new approach to PM prioritization using RCM-based risk criteria.

The material in Sections 1.3 and 1.4 (above) provides the logical foundation for this new risk assessment method, which we are calling PM prioritization. This logic can be summed up as follows. There are two ways in which a PM-related failure of a medical device can put the safety of a patient or device user at risk:

  • Some (life-supporting) devices, on which the the patient's life may be totally dependent, can stop working completely if they are not given some kind of periodic restoration during periodic planned maintenance activities; and
  • Some devices can deteriorate in such a way that their performance or level of safety falls to such a degree that the device is potentially hazardous to the patient or user (these are called hidden failures because this deterioration is often not obvious to the user). These hazards are detected and corrected during periodic planned maintenance.


To maximize patient safety it is important to ensure that all devices whose failure can put the safety of the patient at risk receive appropriate attention. Restoring or replacing a device’s non-durable parts in a timely manner (using what we call device restoration or DR tasks) will reduce the device’s overall failure rate to some degree (but certainly not to zero). And periodic safety verification or SV tasks will uncover any potentially hazardous hidden failures, hopefully before they can cause a patient injury.

Based on certain combinations of these five risk criteria we are proposing a new approach to determining which medical devices are most likely to be potentially hazardous if they are not given periodic attention. These are the devices that should be given an appropriate level of priority for periodic planned maintenance. The term RCM-based risk criteria is appropriate because the logical basis for this questionnaire is the same logical basis as that embedded in the RCM approach.( See HTM ComRef 1. and HTM ComRef 26.)

It is important to point out here that not all possible hidden failures are listed in column 5 of Table 3. In many cases there may be a number of possible hidden failures and the best way of identifying them is to review the test protocols listed in the performance verification and safety testing (PVST) section of the device's generic PM procedure. For example, by looking at this section of the generic PM procedure for a defibrillator-monitor (click on the PM Code in the 3rd column of Table 3 - DEF-01 you can see that Tasks (S4 thru S7) have been labelled as "Serious failure is potentially Life-threatening". The example cited in the fifth column of Table 3 is that the ""hidden failure caused the unit to under-deliver"" which would correspond to a PM finding that Task S7 indicated that the delivered energy was significantly less than what the energy level selected. According to the extent to which the device is found to be out-of-spec (OOS) the adverse outcome should be judged to be of either LOS 1, LOS 2, or LOS 3 level of severity. In both of these cases (an anticipated overt failure or a hidden failure) the analyses in the tables should include this additional judgment on the outcome and worst case level of severity of each anticipated failure, entered in the sixth or seventh column of the respective table.


Table 2 and Table 3 illustrate how this concise risk characterization process works. We have used the compounded result of these risk assessments to filter and categorize the subset of the 70+ more complex device types (listed in Table 1 ) that we believe represent all or most of the device types that are likely to meet either of the Task Force's first three risk criteria. Although this particular subset represents only about 5-10% of the 700 to 1500 different types of medical equipment in modern hospitals, we believe that it represents all of the types of device that are likely to injure a patient, either if they stop working completely or if they develop some kind of significant hidden degradation.

The concise scenarios described in the fifth and sixth columns of Table 2 and Table 3 make the categorization process logical and quite transparent since the judgments are made public on the Task Force's wiki website and they are there to be challenged. This new process should allow for a much better community-wide consistency than the broad, potentially subjective generalizations of earlier methods. The new method introduces one or two new terms to characterize more precisely the nature of the device types that should be considered potentially hazardous, but these new terms are helpful identify which preventive strategies, including non-maintenance measures, will work best for maximizing patient safety (see HTM ComDoc 8. )

The Task Force has prepared a brief statement documenting the why this PM Criticality questionnaire is consistent with established industry standards of practice.


From HTM ComDoc 7.

The maintenance entity must use some form of coding for repair calls that allows for a separate count of the failures that are attributable to inadequate PM (similar to the MR 1 described in HTM ComDoc ?). Because of its value in maximizing total equipment safety, we also recommend a coding of at least the three basic causes of total failure described in HTM ComDoc 1- namely IRFs or inherent reliability-related failures; MRFs or maintenance-related failures; and PRFs or process-related failures. Adopting the full 15 category classification and coding method described in HTM ComDoc 1 and HTM ComDoc 8. is highly desirable because of its value in diagnosing possible non-maintenance remedial actions.


Site Toolbox:

Personal tools
This page was last modified 22:55, 29 September 2018. - This page has been accessed 69,172 times. - Disclaimers - About HTMcommunityDB.org