Log in Article Discussion Edit History Go to the site toolbox

HTM ComDoc 1

From HTMcommunityDB.org

(Difference between revisions)
Revision as of 19:08, 19 September 2018 (edit)
Sysop (Talk | contribs)
('''1.9.5 <u>Question 5</u>. What changes to current PM work practices would be beneficial?)
← Previous diff
Revision as of 19:10, 19 September 2018 (edit)
Sysop (Talk | contribs)
('''1.9.4 <u>Question 4</u>. How to maximize equipment-related reliability and safety)
Next diff →
Line 271: Line 271:
So, if our overall goal is to reduce the number of medical device failures, it makes sense to investigate ways in which these other causes can be reduced or eliminated. So, if our overall goal is to reduce the number of medical device failures, it makes sense to investigate ways in which these other causes can be reduced or eliminated.
-In [[HTM ComDoc 8]] we also point out that, based on the general statistics on causes of device failures, the most effective strategy for reducing failures of the <font color = red>critical life support device types</font> is, <u>first</u>, to give preference during device acquisition to those devices that are reported to have the highest level of inherent reliability, followed by implementing additional measures to reduce failures from the list of causes presented immediately below. They are listed in descending order of anticipated effectiveness. The possible impact of the strategy of giving preference to devices with the highest levels of inherent reliability is unknown at this time but current statistics indicate that the inherent unreliability of the devices themselves accounts for 45-55% of all failures.+In [[HTM ComDoc 8]] we also point out that, based on the general statistics on causes of device failures, the most effective strategy for reducing failures of the <font color = red>critical life support device types</font> is, <u>first</u>, to give preference during device acquisition to those devices that are reported to have the highest level of inherent reliability, followed by implementing additional measures to reduce failures from the list of causes presented immediately below. They are listed in descending order of anticipated effectiveness. The possible impact of the strategy of giving preference to devices with the highest levels of inherent reliability is unknown at this time but current statistics indicate that the <font color = red>inherent unreliability of the devices themselves accounts for 45-55% of all failures</font>.
* <u>User-related issues</u> such as controls or switches that have been set incorrectly. Although this type of failure may not always lead to a complete loss of function, it can have the same effect as actual failure. For example, an incorrectly set defibrillator can jeopardize patient resuscitation. (These '''Category PR1''' calls typically represent '''between 13-20%''' of all of the repair calls). * <u>User-related issues</u> such as controls or switches that have been set incorrectly. Although this type of failure may not always lead to a complete loss of function, it can have the same effect as actual failure. For example, an incorrectly set defibrillator can jeopardize patient resuscitation. (These '''Category PR1''' calls typically represent '''between 13-20%''' of all of the repair calls).

Revision as of 19:10, 19 September 2018

Contents

Start here: PM Basics, key concepts and terminology (currently under revision)

(This document was last revised on 9-7-18)

1.1 Device failures and measures of reliability

A device or equipment system is considered to have failed when:

  • it no longer performs the function or functions that the user wants it to perform, or
  • when it functions as it should, but in an unsafe or otherwise unsatisfactory manner.

It is a truism, similar to the impossibility embedded in the concept of perpetual motion, that there is no such a thing as an infallible device. All devices fail in one way or other, at some time or other. The simplest measure of a device’s reliability is its failure rate - the number of times that it failed to perform during a particular time period. Since failures are predominantly random, their failure performance (their reliability) is usually expressed as an average number of failures over a particular time period. However, a more intuitive way of expressing device reliability is in the form the device's mean time between failures or MTBF which is the inverse of the failure rate over a particular period of time. For example; a device that has a failure rate of one failure every 75 years (on average) has a mean time between failures of 75 years.

1.1.1 Expressing reliability as a failure rate or as a mean time between failures

Mean time between failures (MTBF) is the inverse of the failure rate. For example, a device that has failed twice in nine years is demonstrating a failure rate of 0.22 failures per year and an MTBF of 4.5 years. Average failure rates can also be derived by dividing the total number of device failures occurring during the observation period by the number of device-years making up the total device experience. For example, if a batch of 10 devices experiences two failures during nine years, then the failure rate is 0.022 failures per year and the MTBF is 45 years. The larger the experience base (in device-years), i.e. the greater the number of devices in the sample and the longer the observation period, the closer the observed failure rate will be to the device’s true failure rate.

It is generally easier for lay persons to relate to an MTBF because it is an integral period of time, such as 3 years or 30 years - a simple, easily comprehended metric. For example, most people will have little difficulty in considering a device with an MTBF of just one month to have a relatively poor level of reliability and, conversely, considering a device with an MTBF of 50 years to be quite reliable. But when expressed as the equivalent failure rate, the MTBF of 1 month (= 12 failures per year) versus the MTBF of 50 years (= 0.02 failures per year) the contrast between the two levels of reliability (12 versus 0.02) does not seem quite so striking.

Since, ideally, we would like to separate various different kinds of devices into neat compartmentalized categories such “safe” and “hazardous” we have to confront the difficulty of setting boundaries and consequent gray areas around those boundaries. For example, setting a threshold of, say, 75 years for the MTBF that should be considered safe creates the hard-to-answer question of how much less reliable (and thus less safe) is a device with an MTBF of 74 years than one with an MTBF of 75 years? There is, of course, no simple answer to that question. There are grey areas. It is all relative.

This discussion is made a little more complicated by the fact that there are a several different reasons why devices fail, and lumping all of these failures for these different reasons into one overall failure rate, or corresponding MTPF, might well raise the question that this total failure rate does not seem to fairly describe what we think of as either the reliability of the device itself, or the effectiveness of the way we maintain it. Section 1.4 below addresses the nature of these different causes of failure and how they can be categorized and used to develop a helpful and meaningful analysis.

1.2 What is maintenance?

There are several adequate dictionary definitions of maintenance but, in the context of maintaining equipment, it is best defined as "the process of keeping the equipment in proper working order, in good physical condition and acceptably safe". The definition used in the highly respected RCM approach to equipment maintenance is “keeping the equipment available for use”. For more about RCM, see HTM ComDoc 14. "An introduction to Reliability-centered Maintenance (RCM): The modern approach to Planned Maintenance".

A traditional equipment maintenance program has three parts:

  1. Corrective maintenance or, as it is more commonly called, repair, is the process of returning a device that is in a failed state (i.e. that is no longer doing what the user wants it to do) to a safe condition and proper working order. This includes correcting any significant hidden failures even though they do not usually disable the primary functions of the device.
  2. Cosmetic repair, is the process of restoring a device that is damaged to a safe and cosmetically like-new condition. While cosmetic repairs are generally considered a lower priority because the device may still be functioning within the manufacturer’s functional specifications it may be damaged in such a way that it is unsafe. For example, a damaged cover may be presenting a sharp edge that could be hazardous to either the patient or to a user.
  3. Preventive maintenance. This third component is very important because from the very beginning, with the earliest machines developed during the time of the industrial revolution, it was widely believed that restoring the device's non-durable parts, as needed, before the end of the device's anticipated lifetime would be beneficial because it would reduce the number of unexpected machine breakdowns. In return for these scheduled PM interventions to restore the device's non-durable parts, the device users expect a lower level of the disruption and loss of productivity, as well as some reduction in overall maintenance costs, because the device should experience fewer breakdowns.

Non-durable parts (NDPs)- which are sometimes loosely called disposables or disposable parts - are components of the device that are subject to progressive wear or deterioration. They typically include moving parts,such as bearings, drive belts, pulleys, mechanical fasteners and cables, which require periodic cleaning and lubrication as well as certain non-moving parts such as electrical batteries, gaskets, flexible tubing and various kinds of filters which may need to be cleaned, adjusted, refurbished or replaced sometime during the useful lifetime of the device. Which particular parts the device manufacturer considers to be non-durables is identified by the presence of corresponding device restoration tasks in the manufacturer's recommended PM procedure.

As we describe more fully in HTM ComDoc 14. "An introduction to Reliability-centered Maintenance (RCM): The modern approach to Planned Maintenance" ............

Belief in this traditional device restoration approach to improving machine reliability continues to this day, particularly in certain relatively small industry sectors, even though the findings that started the revolutionary RCM approach to maintenance in the 1970s have caused a considerable amount of rethinking about whether or not intrusive maintenance interventions really do improve the device's overall reliability. Certainly there are still quite a number of medical devices such as ventilators, spirometers and traction machines that are more mechanical than electronic, where the manufacturers still recommend that certain parts be given some kind of periodic restoration (cleaning, refurbishment or replacement). However, we don’t yet have good, independent evidence as to whether or not these manufacturer-recommended PMs, particularly those involving the more intrusive overhauls, are truly beneficial or cost-effective. We have not yet gathered the data on the impact of these recommended interventions on the reliability of these more mechanical devices. That investigation is one of the goals that the Maintenance Practices Task Force (MPTF) has set for itself. We discuss this data gathering challenge in more detail in HTM ComDoc 4.

1.3 What exactly does the term "PM" mean in the context of medical equipment maintenance?

In the special case of maintaining medical equipment, there is a second very important reason besides device restoration for making periodic scheduled interventions. And that is testing the device to detect critical degradation in the functional performance of the device or in its condition with respect to safety. These deteriorations can be quite subtle, and in RCM jargon they are called hidden failures. The term is appropriate because these subtle changes do not completely disable the device's primary functions and so they will usually go unnoticed by the device users.

It is important to detect these subtle deteriorations (hidden failures) because there are certain kinds of medical devices that can cause a patient injury if their performance becomes significantly substandard or their level of safety falls below the relevant requirements. Elsewhere (see HTM ComDoc 3.) we characterize the types of devices that have a theoretical potential to injure a patient if they deteriorate in this way as hidden failure-critical or HF-critical devices. These devices need to be subjected to periodic safety verification tasks. Appropriate safety verification tasks for checking out each particular type of device are typically included as a part of the device manufacturer's recommended PM procedure.

Similarly we can characterize devices that have a theoretical potential to injure a patient, if they simply stop working, as life support devices (See HTM ComDoc 3.) As the descriptor (life support) implies it is important to minimize failures of these devices. If these devices have manufacturer-designated non-durable parts (NDPs) they are vulnerable to what the Task Force calls wear out type failures and they need to be subjected to appropriate device restoration (DR) tasks to prevent the device from failing. This will eliminate one (but only one) source of device failures. So, a life support device that has manufacturer-designated non-durable parts vulnerable to wear-out type failures. The test for this is whether or not the device manufacturer's recommended PM procedure includes any device restoration tasks.

One of the recurring obstacles in our discussions of PM over the years has been the use of a number of imprecise and inconsistent terms. Unfortunately there is still no general consensus. So, in an attempt to establish a standardized and more consistent PM terminology, we are proposing (below) some new terms.

We believe that it would be quite difficult to get the entire population of engineers and technicians practicing in the medical equipment maintenance field to change from using the long-established traditional diminutive “PM”. To accommodate this practical issue we are proposing to introduce another term with the same diminutive. The new term, "planned maintenance" will be used to define the combination of the traditional device restoration tasks (what we have traditionally called “preventive maintenance”) and the performance/ safety-oriented safety testing tasks that are more or less unique to the medical field. In this new formulation we are proposing to use the term “device restoration tasks" as a short label for the restoration of the device's non-durable parts. It is a simple and appropriately descriptive term.

We are suggesting this new terminology in full recognition of the fact that there are a number of other competing terms that have evolved over time. For example the term “scheduled maintenance” has been proposed as an alternative to “preventive maintenance” but it is not a very good fit semantically because it implies that the device restoration tasks are always performed according to some kind of clock; either by conventional timing (e.g. every 6 or 12 months) or by a time-of-use clock (e.g. every 1000 hours of use). There is, however, a more modern practice in which the deteriorating part is restored on a more efficient “just-in-time” basis by monitoring the actual condition of the part. In some cases the monitoring is performed by some kind of sensor but more commonly in the medical equipment sector it is simply done by conducting periodic visual inspections. In the RCM approach this “just-in-time” restoration is called predictive maintenance. In addition to this, what we are proposing to call safety verification (SV) tasks have been given the collective name “inspections” by ECRI Institute and others. We prefer the more descriptive term “safety verification” tasks.

So, in summary, in the context of medical equipment maintenance, the contraction “PM” should be understood to mean “planned maintenance” which is defined as a combination of two different types of tasks; one (device restoration tasks) aimed at preventing wear-out failures, and the other (safety verification tasks) aimed at detecting then correcting hidden failures; i.e.


Planned maintenance (PM) procedure = Device restoration (DR) tasks + Safety verification (SV) tasks


1.4 What are the causes of medical device failures?

There are a number of different reasons (causes) why equipment systems fail and it is particularly important to recognize that not all of these failures can be prevented by some kind of planned maintenance. Consider, for example, the following list of possible causes of device failure:

  • The first set of causes can be classified as inherent reliability-related failures (IRFs) that are attributable to the design and construction of the device itself, including the inherent reliability of the components used in the device. They typically represent 45 - 55% of the repair calls. This type of failure can be reduced (but not to zero) only by redesigning the device or changing the way it was constructed.

Category IR1 Random failure. A device failure caused by the random failure or malfunction of a component part of the device.. A result of the device’s inherent unreliability. IR1 calls typically represent between 46-52% of all repair calls.

Category IR2 Poor construction. A device failure attributable to poor fabrication or assembly of the device itself..

Category IR3 Poor design. A device failure attributable to poor design of the hardware or processes required to operate the device..


  • The second set of causes can be classified as process-related failures (PRFs). They typically represent 40 - 50% of the repair calls. Reducing or eliminating these types of failure typically requires some kind of redesign of the system’s processes - for example, by using better methods to train the equipment users to operate the equipment (as intended by the manufacturer) or to train them to treat the equipment more carefully. They are not failures that can prevented by any kind of maintenance activities.

Category PR1 Use error. A device failure attributable to incorrect set-up or operation of the device by the user.. User has not set the device up correctly or does not know how to operate it. Typically PR1 calls represent between 13-20% of all repair calls. (Note that although this type of “failure” does not represent a complete loss of function, it can have the same effect. For example, an incorrectly set defibrillator can result in a failure to resuscitate the patient).

Category PR2 Physical damage. A device failure caused by subjecting the device to physical stress outside its design tolerances.. PR2 calls typically represent between 6-25% of all repair calls.

Category PR3 Discharged battery. A device failure attributable to a failure to recharge a rechargeable battery. PR3 calls typically represent between 7-8% of all repair calls.

Category PR4 Accessory problem. A device failure caused by the use of a wrong or defective accessory.. PR4 calls typically represent between 3-9% of all repair calls.

Category PR5 Environmental stress. A device failure caused by exposing the device to environmental stress outside its design tolerances.. PR5 calls typically represent between 1-7% of all repair calls.

Category PR6 Tampering). A device failure caused by human interference with an internal control.. PR6 calls typically represent <1% of all calls.

Category PR7 Network problem. A device system failure caused by an issue within a data network connected to the device’s output.


  • The third set of causes can be classified as maintenance-related failures (MRFs). They typically represent 2 - 4% of the repair calls. These types of failure can be prevented through some kind of maintenance strategy incorporated into the facility’s maintenance program.

Category MR1 PM-preventable failure. A device failure that could have been prevented by more timely restoration or replacement of a manufacturer-designated non-durable part. E.g. a battery failure, a clogged filter, or build up of dust. Failures due to trapped cables should not be coded this way. MR1 calls typically represent between 1-3% of all repair calls.

Category MR2 Poor set up. A device failure caused by poor or incomplete initial installation or set-up of the device.. MR2 calls typically represent between 1-3% of all repair calls.

Category MR3 Needed recalibration. A device failure attributable to improper periodic calibration. MR3 calls typically represent <1% of all repair calls.

Category MR4 Re-repair. A device failure attributable to a poor quality previous repair of the device.. MR4 calls typically represent <1% of all repair calls.

Category MR5 Intrusive PM. A device failure attributable to earlier intrusive maintenance.. MR5 calls typically represent much <1% of all repair calls.


While the device’s overall reliability, which corresponds directly to the total number of the repair calls - irrespective of what caused them – determines the device's effective reliability, it is the numbers of maintenance-related failures (MRFs) and inherent reliability-related failures (IRFs) that are of greatest interest to us, as maintainers, at this time. The level of MRFs provides a good measure of the effectiveness of the facility’s maintenance program, and the level of IRFs provides an equally good measure of the basic or inherent reliability of the devices in question.

1.5 Which kinds of medical device failures can be hazardous?

There are four ways in which medical equipment failures can be hazardous. However, not all of those failures are PM-preventable failures.

  • If the device is damaged in such a way that it is presenting some kind of direct physical threat to the safety of patients or staff, such an exposed sharp edge.

For example, the case or enclosure of a piece of equipment might be damaged, say as a result of the item being dropped, in such a way that the damaged casing poses a risk of injury to the patient or user, even though the item still works. Or the protective outer layer of the device's electrical cord might be damaged so that it exposes a live conductor posing the risk of an electric shock. These could be hazardous to the patient, to the device user and possibly others. It is to be expected that damage such as this would be noticed and repaired at the time of its periodic maintenance - so, to the extent that this kind of damage occurs and goes unreported, periodic PM contributes to the levels of overall safety. These are not considered to be PM-preventable failures but periodic PM may shorten the time that individuals are exposed to these potentially hazardous outcomes. Situations such as this appear to be encountered quite rarely.

  • If the failure is a sudden, total failure.

There are a number of devices that are life-supporting in the sense that a sudden, total failure while they are in use could put the patient’s life at risk. Examples include critical care ventilators, anesthesia units, heart lung machines, intra-aortic balloon pumps, external pacemakers, defibrillators, AEDs, cardiac resuscitators, infant incubators, neonatal monitors, apnea monitors - and in some circumstances - patient monitors, oxygen monitors and pressure cycled ventilators. In addition to spontaneous random failures it is possible that a device could suddenly stop working if a part that is recommended for periodic restoration fails prematurely. This could also occur if the maintenance interval has been set too long. The failure of any device that is attributable to the failure of a critical part that requires timely restoration is considered to be a PM-preventable failure. However, situations such as this appear to be encountered quite rarely.

  • If the device develops some kind of hidden failure.

There are some devices that have the potential to cause a patient injury if their functional performance falls below a certain critical point in such a way that the deterioration is not obvious to the user. Examples include a defibrillator whose delivered output energy is significantly lower than the level set by the user; or an infusion device that delivers medication at a significantly lower or higher rate than that set by the user. Similarly there are some devices that have the potential to cause a patient injury if their compliance with a relevant safety specification falls below an acceptable point and this deterioration is not obvious to the user. Examples include; an open ground connection in a device that has exposed metal that could conceivably become "live", and a malfunction in devices that have critical alarms. While, strictly speaking, these failures are not totally prevented by periodic PM, the time that patients are exposed to these potentially hazardous outcomes is reduced. Elsewhere (ref ?) we have shown that the exposure of the patient to this possible hazard is reduced from 100% (as it would be with no PM) to a lesser percentage determined by the ratio of the frequency with which the PM testing is performed to the frequency with which the hidden failure occurs. With typical PM intervals in the range of 6 months to 5 years and mean time between failures of these random hidden failures in the range of 50 to 250 years, the reduction in exposure of the patient will be reduced by 95 - 99%. Hazardous hidden failures appear to be encountered quite infrequently.

  • If the device is used improperly.

Almost all medical devices have the potential to injure patients if they are used improperly. However, this is a type of failure that cannot be prevented or mitigated by conventional planned maintenance and they are not considered to be PM-preventable equipment failures. Accident statistics show that misuse of medical devices represent the most common reason for device-related patient injuries.

For more on this subject see HTM ComDoc 8. "Maximizing medical equipment safety"

1.6 Hidden failures

A hidden failure (HF) is said to have occurred when either:

  • the device delivers an output that is significantly out of specification, but sufficiently similar to the output that the user wants, that the failure is not immediately obvious to the user, or
  • the device is no longer in compliance with the relevant safety specifications for the device in question, but this deterioration is also not obvious to the user. These kinds of failures are usually the result of imperceptible random failures in the device's components or subsystems. They are detected through performance or safety tests made during the periodic PMs.

When this more subtle type of failure introduces a significant performance or safety degradation that can be detected only by some kind of performance or safety test it can constitute a serious safety threat. For example, a heart rate alarm that has malfunctioned so that it no longer goes off at the set limit will remain as a hidden but potentially hazardous failure until the alarm function is checked and the potentially dangerous degradation discovered. The potential seriousness (i.e. level of severity) of hidden failures will depend on the nature of the failure and on how far the performance or safety flaw is out of specification. For example; a significant reduction in the output of a defibrillator has to be considered life-threatening but a small excess in the electrical leakage current of a laboratory centrifuge – while it should be noted in the test report - is unlikely to constitute a significant hazard, or be considered an imminent threat.

Hidden failures are discovered when the performance verification and safety testing tasks are performed during the PM. When they are found they should be described in a note on the PM work order or the PM report and it would be helpful if the description of the findings provided enough information to enable a judgment to be made as to the worst case potential level of severity (LOS 3, LOS 2, LOS 1 or LOS 0 - see Section 1.7 below) of the adverse outcome that would have resulted if the hidden failure had not been discovered.

A particularly important type of hidden failure is one that disables the proper operation of an automatic protection mechanism (APM) that is included as a component of the device. An APM is usually included in the design to provide protection against another possible hidden failure that is itself considered to be capable of a serious or potentially life-threatening adverse consequence.

1.7 Possible adverse outcomes of medical device failures

There is a wide range of possible adverse outcomes from device failures. Some create potential physical harm to the patient (or to the device user). Others can result in additional direct or indirect costs to the facility and thus create an economic or business risk to the organization. We address these economic/business risks in greater detail in HTM ComDoc 9. "Medical devices that may benefit from PM from a business/ economics viewpoint"

In the case of outcomes creating the possibility of physical harm it is helpful if there is a need to conduct some kind of risk analysis or risk assessment to define a hierarchy of three levels of severity (LOS) of possible physical harm to the patient, or - in the case of economic harm to the facility - three levels of economic harm to the business.

Outcomes resulting in possible physical harm

  • LOS 3 = Serious, life-threatening injury - The patient (or the user) may lose his or her life.
  • LOS 2 = Less serious, non life-threatening injury - The patient (or the user) may sustain a direct or indirect injury ranging from minor to serious.
  • LOS 1 = No injury, but possible disruption of care - The incident may cause a temporary disruption of care, such as requiring one or more patients to be rescheduled, delaying treatment or delaying the acquisition of diagnostic information.
  • LOS 0 = No discernible injury or possible disruption of care.

Outcomes resulting in possible economic harm

  • Level 3 = Major economic impact - on the facility’s cost of doing business
  • Level 2 = Significant economic impact - on the facility’s cost of doing business
  • Level 1 = Relatively minor economic impact - on the facility’s cost of doing business
  • Level 0 = No discernible impact - on the facility’s cost of doing business

1.7.1 Life support devices

There are some devices, such as critical care ventilators and defibrillators, on which the the patient's continued well being may be totally dependent. These are sometimes called life support devices. Any type of failure that causes such a device to stop working completely or to stop working properly has the potential to result in an adverse outcome at the highest severity (LOS 3) level. If the device also happens to have one or more non-durable parts that needs timely and competent periodic restoration, this device then becomes critically vulnerable to a wear-out failure and it therefore becomes a device that should be given a high priority for PM. The same is true if the device has a hidden failure that could cause a high severity outcome.

1.8 Which kinds of medical equipment failures are PM-preventable?

Of the many ways in which devices can fail (its possible failure modes) listed in Section 1.4 above, there are only two kinds that are PM-preventable:

1. Wear-out failures that could cause the device to stop working completely. These are failures that are caused by a non-durable part not receiving timely, competent restoration.
2. Hidden failures resulting from imperceptible failures of components within the device that do not cause the device to stop working completely but which might reduce the device's performance or safety below a critical level. These are failures that are discovered when performance and safety testing tasks are performed during PMs and although the PM testing does not totally prevent the possibility that a patient will be exposed to the device while it is in a defective state, the discovery and correction of these hidden failures does shorten the period during which patients are exposed to the failure. This benefit is addressed more completely in Sections 6.3 and 6.4 in HTM ComDoc 6.


1.9 The five basic questions about PM


The foregoing analysis puts us in a position to answer the first of the five basic questions about PM - some of which have been addressed previously in HTM ComDoc 15.

1.9.1 Question 1. How, and to what extent, does performing PM on medical equipment improve patient safety?


Generally speaking, PM does improve patient safety, but only to the extent that it detects then corrects the two kinds of PM-preventable failures that were identified just above in Section 1.8 (wear-out failures and hidden failures). And the extent of the improvement in patient safety varies for different devices according to the "level of risk" that the device would have presented if those potential failures had not been detected, and then eliminated. According to the modern theories of risk management, the level of risk takes into account both the level of the severity of the adverse outcome of the event and the likelihood that the event will actually occur.

In this case we are specifically concerned about the level of risk posed by PM-preventable failures, so the extent of the improvement in patient safety is determined by a combination of the potential severity of the outcome of the failure (with the higher levels of outcome severity - such as LOS 3 - being more serious than LOS 2, etc), and the likelihood of the failure occurring. The proper measure of this likelihood of the failure occurring is what the Task Force calls the device's PM-related reliability. We discuss this "likelihood of failing from a PM-preventable cause" more in HTM ComDoc 4 "Consideration of the device's PM-related reliability".

The Task Force has investigated both of these factors. Table 4 provides a ranking of the various device types according to the severity of each device's potential PM-preventable failures. For more on this investigation, see HTM ComDoc 3 "Risk assessment: Determining which medical devices are made safer by PM". The device types at the top of the listing in Table 4 (rows 1 through 7) are judged to have potential PM-preventable failures with life-threatening outcomes. The PM-related reliability of each of the top twenty highest severity device types in Table 4 are currently being investigated and as the results become available they will posted to columns C8 and C9 of Table 13. For more on this investigation, see HTM ComDoc 4 "Consideration of the device's PM-related reliability".

The Task Force has set tentative thresholds for what should be considered an acceptable (safe) level of PM-related reliability for the devices in each of the three top levels of potential PM-related risk categories (namely those labeled high, moderate and low in column C10 of Table 13). From this table, once it is completed, professionals in charge of medical equipment maintenance programs will be able to identify which devices (by manufacturer and model) should continue to be maintained strictly according to their manufacturer's recommendations, and for the others, what level of PM-related reliability (which corresponds to PM-related safety when the category of severity is taken into account) is typically achieved when the indicated PM interval and procedure is used. The Task Force has also suggested a way in which the level of PM-related patient safety can be monitored on a continuous basis (see Section 1.12 ?).

As can be seen from the summary below there are several other benefits from performing regular PM besides improving patient safety.

  • Improving patient safety. … Some devices - but only some - are made safer by performing appropriate PM. Not all failures have the potential to cause a serious injury, and not all failures are PM-preventable.
  • Regulatory compliance. … As we explain more fully in HTM ComDoc 11. the CMS regulation addressing PM for medical devices has traditionally been that all medical devices must be maintained strictly according to the device manufacturers' recommendations. Even after the regulations were changed in 2013 there is still a requirement that certain devices be subjected to periodic PM. (For more on this see HTM ComDoc 16).
  • Better business economics. … As we explain more fully in HTM ComDoc 9. some devices - but only some - are made less costly to maintain by performing appropriate PM
  • Customer courtesy and/ or customer reassurance. … We may choose to perform PM on some devices because a user has asked us to do so, or because we believe that periodically inspecting and cleaning equipment used for patient care creates a reassuring "cared for" appearance that the user staff appreciates. While this is a qualitative rather than a quantitative benefit it should not be underestimated. These periodic inspections may also be useful by leading to the discovery of unreported broken equipment.

......................................................................................................................................................................................................................................................

1.9.2 Question 2. What kind of PM program is called for in the current CMS regulations?


The original Medicare legislation in 1965 stated that: "... There must be a regular periodical maintenance and testing program for medical devices and equipment. A qualified individual such as a clinical or biomedical engineer, or other qualified maintenance person must monitor, test, calibrate and maintain the equipment periodically in accordance with the manufacturer's recommendations and Federal and State laws and regulations. ..." But beginning in 1989 and as recently as 2011 the corresponding standards of the Joint Commission allowed equipment that was not considered to present a significant physical risk to be excluded from any specific maintenance requirements stating only that PM frequencies should be based on "criteria such as manufacturer's recommendations, risk levels, or current hospital experience," and they, in effect, endorsed the original Fennigkoh-Smith risk-based methodology.

This changed in 2011 when CMS issued revised regulations that narrowed the still official CMS requirement to use the manufacturer's maintenance recommendations from all equipment to just " ::3. Fail-safe design. Again, for devices with this level of risk, it would be prudent to choose (if it is available) a version of the device that has some kind of built-in fail-safe design, such as component redundancy. All equipment critical to patient health and safety</font> and any new equipment until a sufficient amount of maintenance history has been acquired." The "risk-based" option that TJC had been allowing was effectively rescinded. The revised CMS requirement specifically stated that for what they were now calling equipment critical to patient health and safety " Alternative equipment maintenance (AEM) methods are not permitted." However, there was no clear indication of which particular devices they intended to target with this definition of "critical." They seemed to be placing the responsibility for this onto the facility by stating that the "... hospital may adjust its maintenance, inspection, and testing frequency and activities for facility and medical equipment from what is recommended by the manufacturer, based on a risk‐based assessment by qualified personnel".

Faced with some push-back from members of the HTM community CMS issued a "clarification" memo in 2013 (HTM ComRef 28) in which they tried to address the uncertainty about the precise meaning of the phrase "equipment critical to patient health and safety". The key language in the 2013 memo is quoted in Section 11.3 of HTM ComDoc 11. Suffice it to say that this new language does not clarify sufficiently what the agency intends by the term "critical" and the Task Force's interpretation of their intention is described in Section 11.4 of HTM ComDoc 11. The new regulatory language does however introduce a major concession by allowing devices that are not considered to be "critical" to be included in an Alternative Equipment Management (AEM) program where they can be maintained other than as the manufacturer recommends. As reported also in HTM ComDoc 11., the Task Force summarizes its conclusions about the agency's intention in the form of the following two recommended AEM program inclusion criteria.

Recommended AEM Program Inclusion Criteria

After a careful analysis of the CMS memo the Task Force believes that the agency intends to allow to be included in an AEM program only those devices that meet one, or both, of the following criteria:

  • The device is highly unlikely to cause a serious injury or death to a patient or staff person if it should fail in a way that could have been prevented by the device having been subjected to appropriate PM
  • The device is highly unlikely to fail from a PM-preventable cause

However, note that there is currently a regulatory exception (HTM ComRef 33) ....

The Task Force's suggestions for implementing an efficient risk-based AEM program that will be compliant with these two criteria are contained in a recently-published two-part article in AAMI"s BI&T journal (HTM ComRef 35 and HTM ComRef 36). Much of that material is also contained in HTM ComDoc 16 "Implementing a simple RCM-based Alternate Equipment Management (AEM) program."

......................................................................................................................................................................................................................................................

1.9.3 Question 3. How to maximize the efficiency of a planned maintenance (PM) program


HTM ComDoc 10. "Alternate Maintenance Strategies and Maintenance Program Optimization" identifies the following four maintenance strategies that are relevant to maintaining medical devices.

  1. Traditional fixed interval preventive maintenance (often combined with #3, periodic safety verification)
  2. Predictive maintenance
  3. Periodic safety verification
  4. Light maintenance (also known as run-to-failure maintenance)

The least efficient maintenance strategy in terms of using up scarce technical manpower is (#1) the traditional fixed interval preventive maintenance strategy. Predictive maintenance (#2) is the next least efficient. It differs from strategy #1 primarily in effectively extending the interval between restorations or replacement of the device's non-durable parts by substituting a visual inspection for the original restoration task. The most efficient strategy is, of course, the light maintenance strategy (#4). The periodic safety verification strategy is neutral with respect to efficiency because it must be performed on all devices that have a potential high severity (LOS 3) outcome to a hidden failure. It may also be considered prudent to perform periodic safety verification on all devices that are projected to have a less severe potential (LOS 2) outcome to a hidden failure.

Starting with the least efficient situation - a program in which PM is currently being performed on all of the facility's equipment according to the manufacturer's recommendations -

  • Step 1 Identify which devices can be classified as non-critical devices (see Section 3.8 in HTM ComDoc 3), and to change these immediately to a run-to-failure maintenance method (i.e. perform no scheduled PM).
  • Step 3 Look over the recommendations below that are taken from Section 4.10 of HTM ComDoc 4 and HTM ComRef 36. Then make the changes that you feel comfortable with (see also .... and HTM ComRef 35).

Recommendations for improving the efficiency of a medical equipment maintenance program


These are potentially hazardous devices with either overt or hidden PM-preventable failures that could cause a life-threatening injury and that are demonstrating PM-related failure rates greater than the currently acceptable level (not more than one failure every 75 years). For these devices, it would be prudent to continue to follow the manufacturer-recommended PM procedure (for both the interval and the scope of the tasks) and to routinely monitor the levels of patient safety being achieved, as described in Section 3.10 of HTM ComDoc 3 and HTM ComRef 35. This should be continued until acceptable evidence exists in the national database (Table 13) that some other procedure with more efficient tasks and/or a longer interval is found to demonstrate the same or better level of PM-related reliability or a comparable level of patient safety.

These are potentially hazardous devices with hidden PM-preventable failures capable of causing a life-threatening injury that are demonstrating PM-related failure rates greater than the currently acceptable level (not more than one failure every 75 years). For these devices, for which the only “maintenance” that the manufacturer recommends is periodic safety verification, it would be prudent to continue to follow the manufacturer-recommended safety verification testing schedule and routinely monitor the levels of patient safety being achieved, as described in Section 3.10 of HTM ComDoc 3 and HTM ComRef 35, until evidence exists in the national database (Table 13) that testing at a longer interval results in the same or better level of PM-related reliability or a comparable level of patient safety.

When testing for possible hidden failures with potential high-severity outcomes, there is no optimum interval — shorter is always better. However, it has been shown (see Section 6.3 in HTM ComDoc 6.) that for safety verification–related (hidden) failures with MTBF values greater than about 50 years, the increase in the time that the patient would be exposed to potentially hazardous hidden failures if the testing interval was increased from six months to as long as five years is very small.

These lower PM-risk devices qualify for inclusion in an AEM program either because of the lower level of severity of the outcomes of potential failures or because they have demonstrated an acceptable level of PM-related reliability. Therefore, they can be maintained using a maintenance procedure or strategy other than that recommended by the manufacturer. They can be transitioned immediately to less stringent PM strategies, such as the cost-efficient light maintenance (run-to-failure) strategy - which is mentioned in Appendix A of the CMS memo (HTM ComRef 28). At the very least, the manufacturer-recommended procedures can be modified (such as by omitting electrical safety checks that the facility has found to be nonproductive), or by extending the testing interval to make it coincide with a more convenient or more efficient routine.

The logical rule here is to explore the national database (Table 13) for evidence of more efficient maintenance procedures. It would be prudent to monitor the levels of patient safety (as described in Section 3.10 of HTM ComDoc 3 and HTM ComRef 35) being achieved by the current procedure (or any of the more efficient procedures, if chosen) for devices categorized as PM priority 2 (moderate PM-risk) devices. Monitoring those in the lower risk categories is much less important but can be undertaken if the facility chooses.

If these devices should fail, there is a negligible or zero additional risk to patient safety. Therefore, in the absence of other regulatory mandates, unless there is a convincing case that periodic PM can be justified through lower maintenance costs, these devices are excellent candidates for the very efficient light maintenance (run-to-failure) strategy. It was by adopting this run-to-failure maintenance strategy in the early 1960s that the civil aviation industry was able to reduce its maintenance costs by 50% while, unexpectedly, also improving the reliability and safety statistics for civilian aircraft by a factor of 200 times.

......................................................................................................................................................................................................................................................

1.9.4 Question 4. How to maximize equipment-related reliability and safety


The opening paragraph from HTM ComDoc 8 "Maximizing medical equipment-related reliability and safety" reads as follows:

"To the best of our knowledge, all of the studies reported to date have shown that only a very small percentage of injuries resulting from failures of medical devices are attributable to poor maintenance. See,for example, reference HTM ComRef 12). As we describe in Section 1.4 of HTM ComDoc 1 ...the great majority of medical device failures can be attributed to one or other of a fairly wide range of other causes.... However, if the cause of each device failure is routinely documented in the manner suggested in that same section of HTM ComDoc 1, this information (on which of those causes is currently contributing the most to device failures in a particular facility) can be very helpful in managing device failure prevention activities other than PM, and in monitoring the effectiveness of those efforts".

So, if our overall goal is to reduce the number of medical device failures, it makes sense to investigate ways in which these other causes can be reduced or eliminated.

In HTM ComDoc 8 we also point out that, based on the general statistics on causes of device failures, the most effective strategy for reducing failures of the critical life support device types is, first, to give preference during device acquisition to those devices that are reported to have the highest level of inherent reliability, followed by implementing additional measures to reduce failures from the list of causes presented immediately below. They are listed in descending order of anticipated effectiveness. The possible impact of the strategy of giving preference to devices with the highest levels of inherent reliability is unknown at this time but current statistics indicate that the inherent unreliability of the devices themselves accounts for 45-55% of all failures.

  • User-related issues such as controls or switches that have been set incorrectly. Although this type of failure may not always lead to a complete loss of function, it can have the same effect as actual failure. For example, an incorrectly set defibrillator can jeopardize patient resuscitation. (These Category PR1 calls typically represent between 13-20% of all of the repair calls).
  • Problems related to a poor rechargeable battery management program. (These Category PR3 calls typically represent between 7-8% of all of the repair calls)
  • Physical damage usually caused by a combination of poor design and user carelessness, such as dropping the device. (These Category PR2 calls typically represent between 6-25% of all of the repair calls).
  • Problems with an accessory, such as patient cables and electrodes. (These Category PR4 calls typically represent between 3-9% of all of the repair calls).
  • Problems resulting from an out-of-specification environmental condition, such as poor control of the ambient temperature. (These Category PR5 calls typically represent between 1-7% of all of the repair calls).
  • Lack of timely PM (i.e. failing to restore [replace or refurbish] a part of the device that requires periodic attention. (These Category MR1 calls typically represent between 1-4% of all of the repair calls).
  • Poor installation or poor initial set-up of the device. (These Category MR2 calls typically represent between 1-3% of all of the repair calls).
  • Tampering with internal switches or other controls that are not intended to be user-accessible. (These Category PR6 calls typically represent <1% of all of the repair calls).
  • Problems due to an issue with a data transmission network connected to the device’s output. (Category PR7 calls)

We also note in HTM ComDoc 8 that the best way to reduce potentially critical hidden failures in those device types that are most vulnerable to those kinds of failures (i.e. the device types listed in the first 11 rows of Table 2) is, first, to select versions of the device that have built-in self testing to verify that the device is functioning safely, then be diligent about following the manufacturer's recommendations for periodic safety verification testing. A second measure to consider is implementing pre-use inspections or testing to verify that the device is functioning safely immediately prior to use .


Enhanced Risk Management Program. A very worthy use for some or all of the resources made available by improving the efficiency of the facility's maintenance program would be an enhanced Risk Management Program that incorporated some or all of the additional measures described above.


......................................................................................................................................................................................................................................................

1.9.5 Question 5. What changes to current PM work practices would be beneficial?


There is no question that the most beneficial change to current PM work practices would be for the entire community to standardize the way we perform and report our maintenance activities (see Section 15.3 of HTM ComDoc 15. "Why we need to standardize the format of our maintenance reports").

There are three extremely important benefits that could be realized if a significant number of managers of the HTM community's maintenance programs could be persuaded to standardize on a common format for their maintenance activities and reporting.

  • Maintenance data could be aggregated in a single, community-wide database which would then produce very helpful maintenance safety statistics on at least the more popular medical devices relatively quickly
  • A standard coding system for characterizing the way devices fail could provide valuable information that would allow us to analyze the effectiveness of the facility's equipment safety strategies.
  • A standard format for documenting the findings of the PMs that are performed on all critical devices would enable us to optimize the PM intervals used for the various devices.


End of revised material ................................................................................................................................................................................................................................................

......................................................................................................................................................................................................................................................


From HTM ComDoc 7.

The maintenance entity must use some form of coding for repair calls that allows for a separate count of the failures that are attributable to inadequate PM (similar to the MR 1 described in HTM ComDoc ?). Because of its value in maximizing total equipment safety, we also recommend a coding of at least the three basic causes of total failure described in HTM ComDoc 1- namely IRFs or inherent reliability-related failures; MRFs or maintenance-related failures; and PRFs or process-related failures. Adopting the full 15 category classification and coding method described in HTM ComDoc 1 and HTM ComDoc 8. is highly desirable because of its value in diagnosing possible non-maintenance remedial actions.




1.8.2 A new approach to PM prioritization using RCM-based risk criteria.

The material in Sections 1.3 and 1.4 (above) provides the logical foundation for this new risk assessment method, which we are calling PM prioritization. This logic can be summed up as follows. There are two ways in which a PM-related failure of a medical device can put the safety of a patient or device user at risk:

  • Some (life-supporting) devices, on which the the patient's life may be totally dependent, can stop working completely if they are not given some kind of periodic restoration during periodic planned maintenance activities; and
  • Some devices can deteriorate in such a way that their performance or level of safety falls to such a degree that the device is potentially hazardous to the patient or user (these are called hidden failures because this deterioration is often not obvious to the user). These hazards are detected and corrected during periodic planned maintenance.


To maximize patient safety it is important to ensure that all devices whose failure can put the safety of the patient at risk receive appropriate attention. Restoring or replacing a device’s non-durable parts in a timely manner (using what we call device restoration or DR tasks) will reduce the device’s overall failure rate to some degree (but certainly not to zero). And periodic safety verification or SV tasks will uncover any potentially hazardous hidden failures, hopefully before they can cause a patient injury.

Based on certain combinations of these five risk criteria we are proposing a new approach to determining which medical devices are most likely to be potentially hazardous if they are not given periodic attention. These are the devices that should be given an appropriate level of priority for periodic planned maintenance. The term RCM-based risk criteria is appropriate because the logical basis for this questionnaire is the same logical basis as that embedded in the RCM approach.( See HTM ComRef 1. and HTM ComRef 26.)

It is important to point out here that not all possible hidden failures are listed in column 5 of Table 3. In many cases there may be a number of possible hidden failures and the best way of identifying them is to review the test protocols listed in the performance verification and safety testing (PVST) section of the device's generic PM procedure. For example, by looking at this section of the generic PM procedure for a defibrillator-monitor (click on the PM Code in the 3rd column of Table 3 - DEF-01 you can see that Tasks (S4 thru S7) have been labelled as "Serious failure is potentially Life-threatening". The example cited in the fifth column of Table 3 is that the ""hidden failure caused the unit to under-deliver"" which would correspond to a PM finding that Task S7 indicated that the delivered energy was significantly less than what the energy level selected. According to the extent to which the device is found to be out-of-spec (OOS) the adverse outcome should be judged to be of either LOS 1, LOS 2, or LOS 3 level of severity. In both of these cases (an anticipated overt failure or a hidden failure) the analyses in the tables should include this additional judgment on the outcome and worst case level of severity of each anticipated failure, entered in the sixth or seventh column of the respective table.


Table 2 and Table 3 illustrate how this concise risk characterization process works. We have used the compounded result of these risk assessments to filter and categorize the subset of the 70+ more complex device types (listed in Table 1 ) that we believe represent all or most of the device types that are likely to meet either of the Task Force's first three risk criteria. Although this particular subset represents only about 5-10% of the 700 to 1500 different types of medical equipment in modern hospitals, we believe that it represents all of the types of device that are likely to injure a patient, either if they stop working completely or if they develop some kind of significant hidden degradation.

The concise scenarios described in the fifth and sixth columns of Table 2 and Table 3 make the categorization process logical and quite transparent since the judgments are made public on the Task Force's wiki website and they are there to be challenged. This new process should allow for a much better community-wide consistency than the broad, potentially subjective generalizations of earlier methods. The new method introduces one or two new terms to characterize more precisely the nature of the device types that should be considered potentially hazardous, but these new terms are helpful identify which preventive strategies, including non-maintenance measures, will work best for maximizing patient safety (see HTM ComDoc 8. )

The Task Force has prepared a brief statement documenting the why this PM Criticality questionnaire is consistent with established industry standards of practice.

Non-critical devices

As best we can estimate there are, in round numbers, between 750 and 1500 different types of healthcare-related devices in use in today’s healthcare facilities. An unknown number of these are non-clinical devices such as printers or other device accessories that do not even fall into the formal category of a medical device that is regulated by the FDA. These non-clinical devices are extremely unlikely to be PM-critical. At the other end of the scale there is a group of about 70 device types that are more likely to be PM-critical, either because of their complexity, or for some other reason that was captured in the original Fennigkoh-Smith criteria.

The Task Force believes that a large percentage of the estimated remaining balance of at least 700 device types will prove to be non-critical when they are analyzed. One example is a set of patient scales. When the HTMC generic PM procedure for a set of patient scales (PA.SC-01)is analyzed using the questionnaire process described in section 3.3 of HTM ComDoc 3., responses (1), (2) and (6) are all “no”, and so - according to our criteria - a set of patient scales should be classified as a non-critical device.

Based on the preliminary findings shown in Table 2. and Table 3. we believe that a large number of device types can be shown to be non-critical. This is a very important step because it provides a very solid, rational argument for why a very large number of medical devices can be used quite safely without any kind of periodic PM whatsoever. They simply have no high-severity, PM-preventable failure modes and so, by definition, they are non-critical. The evidence for this is that there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent the device, if it could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm that had already developed.

This leaves a list of about 70 device types, shown in Table 4., that are potentially PM-critical. However, as we will show in Part 2 of this article, by implementing Step 2 of this new risk analysis, which will draw on aggregated maintenance data from the new community-wide database, we will be able to determine which of these devices should actually be designated as PM-critical (high risk) devices and given periodic PM according to the manufacturer’s recommendations. The others are all more reliable, lower risk devices. We anticipate that, when fully implemented, the analysis in Step 2 will reveal devices with risk levels distributed across the full spectrum from high-risk to very low risk devices.

All potentially PM-critical devices are not necessarily high-risk devices!

Just having one or more critical PM-related failure modes is not sufficient to make a device classifiable as a potentially unsafe "high-risk" device. According to modern reliability and risk management theory (HTM ComRef 1., HTM ComRef 2.), "risk" has two components:

  1. The severity of the outcome of the event (in this context a PM-preventable device failure); and
  2. The likelihood that the event (the PM-preventable device failure) will actually occur.

This required combination of two factors means that devices that have a manufacturer-recommended PM procedure with critical device restoration tasks or safety testing tasks will not necessarily become hazardous just because the manufacturer's procedure is not followed or even utilized at all. If the likelihood of any PM-related failures actually occurring (even if they are critical failures with high-severity outcomes) is very low - with a mean time between failures (MTBFs) of, say, 50-75 years or more - then the corresponding risk of harming the patient is reduced from high to moderate, to low, or even to very low. The actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation. This is why traveling on a commercial airliner is considered to be safe. While there is a theoretical potential for a high-severity outcome if the plane should crash, the likelihood that this will actually happen is very low – so the level of risk when flying on a commercial airliner is also very low, relative to other ways of traveling.

In order to determine which devices have the theoretical potential to cause a patient injury (or some less severe adverse outcome) if the device should fail because its PM was not completed in a timely manner - we first need to be clear about what is achieved by performing the various tasks listed in the manufacturer’s recommended PM procedure.

In general, there are two kinds tasks contained in a medical device’s PM procedure. The first kind is a task that restores the device to something close to its original, like-new condition. The Maintenance Practices Task Force calls these device restoration tasks. They are tasks in which components that are subject to deterioration during the useful lifetime of the device, such as batteries, cables, fasteners, gaskets and tubing, are periodically refurbished or replaced. The second kind is some sort of test to detect any hidden degradations in the functional performance or safety of the device that are sufficiently hazardous to require immediate correction. The Task Force calls these safety testing tasks.

It is entirely possible for some manufacturer-model versions of any of the PM-critical device types listed in Table 2. and Table 3. to be classified as low-risk devices if they can be shown to have good reliability (a demonstrated low probability of failing). Table 12. shows the Task Force's tentative definitions of what should be considered acceptable levels of reliability. We will discuss this in more detail in section 3.3 of HTM ComDoc 4.

(There are a very large number of medical devices that can be used quite safely without any kind of periodic PM whatsoever because they have no high or moderate-severity, PM-preventable failure modes. These devices are, by definition, non-critical. The evidence for this is that either there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent a device that could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm if it had already developed; or there are no possible high or moderate severity outcomes from either total failure or serious degradations.)

(Device types that have no potential whatsoever to cause any kind of patient injury or any other significant adverse outcome when they fail, either completely, or by developing a hidden failure - such as a phototherapy light - will be classified as non-critical. And since, by definition, non-critical devices have no significant adverse outcome if they fail, they will all be automatically categorized as inherently safe devices.)

In summary, all non-critical device types (i.e. those that have no critical PM-related failure modes) are, by definition, inherently safe with respect to needing PM. Whereas, all PM-critical device types are potentially high-risk (potentially hazardous) devices unless certain manufacturer-model versions of those device types can be shown to have good reliability (i.e. a low likelihood that the PM-related failures will actually occur), in which case they can be categorized as lower risk devices. See Table 12. for a more details on the tentative definitions of the various levels of device risk.

We will describe how to determine which devices are PM-critical/high risk devices in section 4.x of HTM ComDoc 4.

So, if the total failure or critical degradation of the device is highly unlikely to occur, the level of risk associated with using the device is correspondingly small. Devices that are classified in the tables as having potentially life-threatening severity (LOS 3) outcomes from total failure or from critical degradation should more properly be called potentially hazardous or potentially high-risk devices because the actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation.

1.12 Progress Report (November 2017)

To quote from the third paragraph of the statement titled “Background” on the Introductory Materials page of the website created by the Maintenance Practices Task Force (MPTF), one of the primary motivations prompting this project, which AAMI began supporting in November 2015, is to address the huge problem created by the failure of the Healthcare Technology Management (HTM) community to establish a “… generally-agreed way of quantifying current levels of maintenance-related medical equipment safety …”.

Much has been written about medical technology, and virtually all of it states that the ultimate, overriding consideration must always be assuring the very highest levels of patient safety. Maximizing patient safety is, of course, a very worthy goal - with which there can be no quarrel - but to paraphrase one of the better maxims of the business world – if you can’t measure it, you can’t manage it. And since virtually all of the regulations and standards governing the HTM business include a requirement, either direct or indirect, to provide levels of patient safety that are “generally acceptable”, this current lack of an accepted metric for medical device safety – and maintenance-related medical equipment safety in particular - makes it impossible to prove how well (or not) we are satisfying this important obligation. This same lack of the proper tools also makes it very difficult to compare the levels of maintenance-related medical equipment safety achieved by different maintenance strategies.

A current manifestation of this quandary is the requirement in the recently amended medical equipment maintenance regulations of The Centers for Medicare & Medicaid Services (CMS) which implies very strongly that the use of any of the now-permitted alternate equipment management (AEM) strategies for maintaining the facility’s medical equipment must keep the equipment just as safe as it would be if the devices were being maintained according to the manufacturer’s recommendations. This is clearly a very reasonable requirement but it is creating practical difficulties for facilities trying to introduce more cost-effective maintenance practices, as well as for the various survey and inspection teams who are responsible for confirming that maintenance practices other than those recommended by the device manufacturer are not exposing patients to higher levels of risk.

Everyone familiar with the standard texts on risk management knows that safety itself is not directly measurable (see, for example, the third chapter in “ Of Acceptable Risk: Science and the Determination of Safety “ by William Lowrance). The only aspect of safety that is measurable is the actual level of risk created by some specified potential hazard. So when we say that something such as a medical device is safe, what we are really doing is making a judgment relative to some recognized standard that the risk created by one or more particular potential hazards (such as, in this case, the potential for an adverse patient outcome attributable to inadequate device maintenance) is generally acceptable. Devices that are deemed “safe” in this way are really only safe with respect to the specifically identified hazard, or hazards.

While all of the various participants in the HTM business - including the regulating authorities - have cited patient safety as the primary driver within their respective areas of responsibility, there has been a lack of meaningful efforts to establish a rational, scientific basis for making these judgment calls on the level of safety of the patient. This is certainly true of the regulatory framework that is intended to ensure the safety of medical devices in their working lifetime, subsequent to the device having passed through the FDA ‘s initial device approval process. It has already been pointed out in the just-published AEM Program Guide that some of the accreditation standards based on the CMS regulation (referenced above) contain sloppily incorrect or inconsistent terminology as well as a complete lack of direction on how conformance to what are allegedly the “generally acceptable” levels of patient risk should be demonstrated.

By adopting the widely used and very well respected scientific methodology embedded in reliability-centered maintenance (RCM), the Maintenance Practices Task Force (name shortened elsewhere in this report to “the Task Force”, “the MPTF” or just “the TF”) has made significant progress towards solving this fundamental problem. As described in HTM ComDoc 1 and several other related documents on the website, the Task Force has created a useful method for characterizing the level of the PM-related risk associated with the different manufacturer-model versions of the most PM-critical medical devices. Each of the identified levels of maintenance-related risk are combinations of two parameters; one representing an assessment of the worst-case level of severity of the adverse outcome of a PM-preventable failure of the device (the TF has selected three representative levels - either a life-threatening injury, a serious but less than life-threatening injury, or a less serious outcome such as a delayed diagnosis or delayed treatment) and a second parameter quantifying the likelihood of a PM-preventable failure actually occurring (represented by the device’s documented PM-related failure rate).

The Task Force has also proposed a practical method for establishing what level of PM-related risk should be considered acceptable – another notable step forward. In this particular context it seems logical to set the standard for acceptable maintenance-related safety at the typical level of PM-related risk achieved when the devices in question are maintained strictly according to the manufacturer’s recommendations. Just what this level is, can and will be determined (see project Objectives # 3 & 4) by conducting a statistically satisfactory number of tests to determine and document the actual PM-related failure rates demonstrated by a sample drawn from a number of the potentially most critical devices during a time when they are being maintained according to their manufacturer’s recommendations.

Patient safety as it relates to the maintenance of medical devices

Much has been written about medical technology and virtually all that is written cites maximizing patient safety as the ultimate, overriding consideration. This is, of course, a very worthy goal with which there can be no quarrel; it is the motherhood and apple pie of healthcare technology management (HTM) and a cherished icon that we all serve dutifully and enthusiastically. In addition to this, virtually all of the regulations and standards governing the HTM business include either a direct or indirect obligation to provide acceptable levels of patient safety. The rub comes however when we attempt to quantify how well our efforts are measuring up to this rather vague obligation to maximize patient safety.

A recent piece by …. on the debate over medical device service urging …. is an good example.

Safety itself is not measurable. The only aspect of safety that is measurable is the actual level of risk created by some specified potential hazard. So when we say something is safe, what we are really doing is making a judgment that the level of risk posed by one particular potential hazard is considered to be acceptable. The device is indeed safe but only with respect to this one particular hazard (cite Lowrance).

To illustrate this we will use an example from recent investigations (cite ?) into alternative equipment management (AEM) strategies that would make medical devices just as safe as they would be if the device were being maintained according to the manufacturer’s recommendations – something now permitted by recent revisions to the regulations of the Centers for Medicare & Medicaid Services (CMS) relating to medical equipment (cite ?). In this example the risk that we are concerned with is the risk that the device will fail from a PM-preventable cause.

PM-preventable failures.

The key to identifying which device failures can be attributed to a PM-preventable cause (could have been prevented by a more effective or more timely PM activity) is to examine each of the tasks listed in the manufacturer’s PM procedure. This will identify which of the device’s components needs some kind of periodic restoration such as a filter that needs cleaning or a battery that needs to be replaced. If a device is presented for repair and the only thing wrong with it can be traced a component that is scheduled for some kind of restoration during PM, then it is quite likely that this failure can be considered to be a PM-preventable failure. Maybe the restoration performed during the last PM was ineffective or maybe the PM interval is too long. Similarly, the manufacturer’s PM procedure may include testing the performance of the device to detect deteriorations in either its functional performance or in its compliance with certain safety requirements that would not be obvious to the user – so-called hidden failures. While these deteriorations have not caused a complete failure the diminished performance could be putting the patient at risk and these should be considered to be PM-preventable failures. A shorter PM interval would have reduced the length of time that the patient was exposed to some level of risk.

In order to gather reliable information on the frequency with which PM-preventable failures are encountered it is very important to standardize the techniques and criteria for diagnosing when a user-reported failure is legitimately attributable to inadequate or tardy PM. Similarly, we need to standardize the techniques and criteria for diagnosing failures encountered when the actual PM is performed. Obviously a PM finding that the device failed one or more of any critical performance or safety tests included in the PM procedure constitutes a PM-preventable failure (It is an indicator that the PM interval is too short). And the Maintenance Practices Task Force has proposed that discovering a part that was scheduled for some kind of restoration during the PM has already deteriorated to the point where it could have been interfering with the proper operation of the device is also considered to be a PM-preventable failure. This is also an indicator that the PM interval is too short.

Unfortunately there is still some considerable variation in the kind of maintenance data collected throughout the field. While there have been recommendations for standardizing on these particular indicators that the failure was PM-preventable, they are not yet in widespread use.


So even though we are often required to characterize something such as a maintenance practice or maintenance “strategy” as safe or unsafe we generally fail to address the judgment call nature of this requirement. Although we champion data driven decisions – and this is an important and laudable step forward - we need to recognize that with respect to safety there are generally no prescribed boundaries separating acceptable (i.e. safe) levels of risk from unacceptable (i.e. unsafe) levels of risk.

The data driving the decisions are the levels of risk relevant to certain specific hazards.




In summary - on Question 2 - How much of an impact can PM have on the safety of medical devices?


While the device’s overall reliability, which corresponds directly to the total number of the repair calls - irrespective of what caused them – determines the device's effective reliability, it is the numbers of maintenance-related failures (MRFs) and inherent reliability-related failures (IRFs) that are of greatest interest to us, as maintainers, at this time. The level of MRFs provides a good measure of the effectiveness of the facility’s maintenance program, and the level of IRFs provides an equally good measure of the basic or inherent reliability of the devices in question.



Site Toolbox:

Personal tools
Disclaimers - About HTMcommunityDB.org