Log in Article Discussion Edit History Go to the site toolbox

HTM ComDoc 1

From HTMcommunityDB.org

(Difference between revisions)
Revision as of 23:26, 28 September 2018 (edit)
Sysop (Talk | contribs)
('''1.10.1 <u>Question 1</u>. How, and to what extent, does performing PM on medical equipment improve patient safety?''')
← Previous diff
Current revision (21:26, 8 January 2019) (edit)
Sysop (Talk | contribs)
(1.1.1 Expressing reliability as a failure rate or as a mean time between failures (MTBF))
 
Line 1: Line 1:
==<font color = red>Start here</font>: PM Basics, key concepts and terminology == ==<font color = red>Start here</font>: PM Basics, key concepts and terminology ==
-<font color =green>(''This document was last revised on 9-25-18)</font>''<br>+<font color =green>(''This document was last revised on 12-8-18)</font>''<br>
-* Back to [[Main Page]], [[Explanatory articles supporting the Tables]], or on to [[HTM ComDoc 2]] ''"Important definitions"'', (old page archived at [[HTM ComDoc 1.]])+* Back to [[Main Page]], [[Explanatory articles supporting the Tables]], or on to [[HTM ComDoc 2]] ''"Important definitions"'',
===1.1 Device failures and measures of reliability=== ===1.1 Device failures and measures of reliability===
Line 13: Line 13:
The simplest measure of a device’s reliability is its failure rate - the number of times that it failed to perform, or failed to perform satisfactorily, during a particular time period. Since failures are predominantly random, a device's failure performance (it's reliability) is usually expressed as an average number of failures over a particular time period. However, a more intuitive way of expressing device reliability is in the form the device's [[mean time between failures]] or [[MTBF]] which is the inverse of it's failure rate over a particular period of time. For example; a device that has demonstrated an average failure rate of one failure every 75 years is demonstrating a mean time between failures of 75 years. The simplest measure of a device’s reliability is its failure rate - the number of times that it failed to perform, or failed to perform satisfactorily, during a particular time period. Since failures are predominantly random, a device's failure performance (it's reliability) is usually expressed as an average number of failures over a particular time period. However, a more intuitive way of expressing device reliability is in the form the device's [[mean time between failures]] or [[MTBF]] which is the inverse of it's failure rate over a particular period of time. For example; a device that has demonstrated an average failure rate of one failure every 75 years is demonstrating a mean time between failures of 75 years.
-====1.1.1 Expressing reliability as a failure rate or as a mean time between failures====+====1.1.1 Expressing reliability as a failure rate or as a mean time between failures (MTBF)====
Mean time between failures ([[MTBF]]) is the inverse of the failure rate. For example, a device that has failed twice in nine years is demonstrating a failure rate of 0.22 failures per year and an MTBF of 4.5 years. Average failure rates can also be derived by dividing the total number of device failures occurring during the observation period by the number of device-years making up the total device experience. For example, if a batch of 10 devices experiences two failures during nine years, then the failure rate is 0.022 failures per year and the MTBF is 45 years. The larger the experience base (in device-years), i.e. the greater the number of devices in the sample and the longer the observation period, the closer the observed failure rate will be to the device’s true failure rate. Mean time between failures ([[MTBF]]) is the inverse of the failure rate. For example, a device that has failed twice in nine years is demonstrating a failure rate of 0.22 failures per year and an MTBF of 4.5 years. Average failure rates can also be derived by dividing the total number of device failures occurring during the observation period by the number of device-years making up the total device experience. For example, if a batch of 10 devices experiences two failures during nine years, then the failure rate is 0.022 failures per year and the MTBF is 45 years. The larger the experience base (in device-years), i.e. the greater the number of devices in the sample and the longer the observation period, the closer the observed failure rate will be to the device’s true failure rate.
Line 103: Line 103:
====1.4.1 Coding repair work orders==== ====1.4.1 Coding repair work orders====
-The Task Force recommends very strongly that all repair work orders be provided with a field for coding what is judged to be the primary reason (cause) why the device failed. As will be described later the statistics obtained from this coding is very useful for managing the various different failure prevention measures. The recommended format for this coding follows the classification arrangement described immediately above in <u>Section 1.4</u> For example, a failure that is judged to have been caused by the device having been dropped would be coded as a <u>PR2 failure</u>, and a failure that is judged to have no obvious cause would be coded as an <u>IR1 failure</u>.+The Task Force recommends <font color = red>very strongly</font> that all repair work orders be provided with a field for coding what is judged to be the primary reason (cause) why the device failed. As will be described later the statistics obtained from this coding is very useful for managing the various different failure prevention measures. The recommended format for this coding follows the classification arrangement described immediately above in <u>Section 1.4</u>. For example, a failure that is judged to have been caused by the device having been dropped would be coded as a <u>PR2 failure</u>, and a failure that is judged to have no obvious cause would be coded as an <u>IR1 failure</u>.
====1.4.2 Different measures of device reliability==== ====1.4.2 Different measures of device reliability====
Line 121: Line 121:
*If the device develops some kind of <u>hidden failure</u>. *If the device develops some kind of <u>hidden failure</u>.
-There are some devices that have the potential to cause a patient injury if their functional performance falls below a certain critical point in such a way that the deterioration is not obvious to the user. Examples include a defibrillator whose delivered output energy is significantly lower than the level set by the user; or an infusion device that delivers medication at a significantly lower or higher rate than that set by the user. Similarly there are some devices that have the potential to cause a patient injury if their compliance with a relevant safety specification falls below an acceptable point and this deterioration is not obvious to the user. Examples include; an open ground connection in a device that has exposed metal that could conceivably become "live", and a malfunction in devices that have critical alarms. While, strictly speaking, these failures are <font color = red>not totally prevented by periodic PM</font>, the time that patients are exposed to these potentially hazardous outcomes is reduced. Elsewhere (ref ?) we have shown that the exposure of the patient to this possible hazard is reduced from 100% (as it would be with no PM) to a lesser percentage determined by the ratio of the frequency with which the PM testing is performed to the frequency with which the hidden failure occurs. With typical PM intervals in the range of 6 months to 5 years and mean time between failures of these random hidden failures in the range of 50 to 250 years, the reduction in exposure of the patient will be reduced by 95 - 99%. Hazardous hidden failures appear to be encountered quite infrequently.<br> +There are some devices that have the potential to cause a patient injury if their functional performance falls below a certain critical point in such a way that the deterioration is not obvious to the user. Examples include a defibrillator whose delivered output energy is significantly lower than the level set by the user; or an infusion device that delivers medication at a significantly lower or higher rate than that set by the user. Similarly there are some devices that have the potential to cause a patient injury if their compliance with a relevant safety specification falls below an acceptable point and this deterioration is not obvious to the user. Examples include; an open ground connection in a device that has exposed metal that could conceivably become "live", and a malfunction in devices that have critical alarms. While, strictly speaking, these failures are <font color = red>not totally prevented by periodic PM</font>, the time that patients are exposed to these potentially hazardous outcomes is reduced. For more on this - see Section 4.7 of [[HTM ComDoc 4]]. Elsewhere - See Sections 6.3 and 6.4 of [[HTM ComDoc 6]] - we have shown that the exposure of the patient to this possible hazard is reduced from 100% (as it would be with no PM) to a lesser percentage determined by the ratio of the frequency with which the PM testing is performed to the frequency with which the hidden failure occurs. With typical PM intervals in the range of 6 months to 5 years and mean time between failures of these random hidden failures in the range of 50 to 250 years, the reduction in exposure of the patient will be reduced by 95 - 99%. <u>Hazardous hidden failures appear to be encountered quite infrequently</u>.<br>
*If the device is <u>used improperly</u>. *If the device is <u>used improperly</u>.
Line 127: Line 127:
<br> <br>
-For more on this subject see [[HTM ComDoc 8.]] ''"Maximizing medical equipment safety"''+For more on this subject see [[HTM ComDoc 8]] ''"Maximizing medical equipment-related reliability and safety"''
===1.6 Hidden failures === ===1.6 Hidden failures ===
Line 144: Line 144:
<br> <br>
-In the recommended format of each HTMC generic PM procedure (see <u>column 7</u> of [[Table 4]]) a reporting section is added at the end asking the service person to indicate by circling one of three letters (<font color=red>'''A, B or F'''</font>) whether or not the performance and safety testing of the device revealed any significant degradations (latent MR1 failures) or any hidden failures. +In the recommended format of each HTMC generic PM procedure (see <u>column 7</u> of [[Table 4]]) a reporting section is added at the bottom of the procedure asking the service person to indicate by circling one of three letters (<font color=red>'''A, B or F'''</font>) whether or not performing the so-called SV or safety verification tasks to evaluate the performance and safety of the device revealed any significant degradation (latent MR1 failures) or any hidden failures .
-:: <font color=red>'''<u>PM Code A</u> = nominal'''</font>. The letter A should be circled when the results of all of the PVST tests were in compliance with the relevant specifications, and any other functions tested were within expectations.<br>+:: <font color=red>'''<u>PM Code A</u> = nominal'''</font>. The letter A should be circled when the results of all of the performance and safety tests were in compliance with the relevant specifications, and any other functions tested were within expectations.<br>
:: <font color=red>'''<u>PM Code B</u> = minor OOS condition(s) found'''</font>. The letter B should be circled when one or more conditions were found that were slightly out-of-spec (OOS) or slightly outside expectations. The purpose of this B rating is to create a watch list to monitor for future adverse trends in particular performance or safety features, even though the discrepancy is not considered to be significant at this time. An example of this would be an electrical leakage reading of 310 microamps which is within 5% of the 300 microamp limit. A B rating should be considered a passing grade.<br> :: <font color=red>'''<u>PM Code B</u> = minor OOS condition(s) found'''</font>. The letter B should be circled when one or more conditions were found that were slightly out-of-spec (OOS) or slightly outside expectations. The purpose of this B rating is to create a watch list to monitor for future adverse trends in particular performance or safety features, even though the discrepancy is not considered to be significant at this time. An example of this would be an electrical leakage reading of 310 microamps which is within 5% of the 300 microamp limit. A B rating should be considered a passing grade.<br>
Line 162: Line 162:
If the PM findings are systematically documented each time a PM is performed, then aggregated into a <u>PM Findings database</u>, it will be possible to: If the PM findings are systematically documented each time a PM is performed, then aggregated into a <u>PM Findings database</u>, it will be possible to:
 +
 +:* get a measure of the <font color=red>PM Yield</font> (the ratio/percentage of <u>problems found during PM</u> to the <u>number of PMs performed</u>, and
:* get an indication of the <font color=red>mean time between failures (MTBFs) of any hidden failures</font>, and :* get an indication of the <font color=red>mean time between failures (MTBFs) of any hidden failures</font>, and
-:* get an indication of <font color=red>how well the PM interval matches the optimum</font> - which would be when the part being restored has deteriorated - but only to the point just before the deterioration begins affecting the functioning of the device.+:* get an indication of <font color=red>how well the PM interval matches the optimum</font> - which would be when the part being restored has deteriorated - but only to the point just before the point where the deterioration begins affecting the functioning of the device.
-::* If the interval is <font color=red>too short</font>, this would be indicated by a preponderance of <u>PM Code 1</u> findings; and+::* A preponderance of <u>PM Code 1</u> findings would indicate that the interval is <font color=red>too short</font>; and
-::*if the interval is <font color=red>too long</font>, it would be indicated by a preponderance of <u>PM Code 9</u> findings.+::* A preponderance of <u>PM Code 9</u> findings would indicate that the interval is <font color=red>too long</font>.
-::*A preponderance of <u>PM Code 5</u> findings would indicate that <font color=red>the PM interval is just about right</font>.+::* A preponderance of <u>PM Code 5</u> findings would indicate that <font color=red>the PM interval is just about right</font>.
------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------
Line 200: Line 202:
<br> <br>
-===1.9 Which kinds of medical equipment failures are PM-preventable?=== +===1.9 Which kinds of medical equipment failures are PM-related failures?===
-Of the many ways in which devices can fail (its possible failure modes) listed in <u>Section 1.4</u> above, there are only two kinds that are PM-preventable:+Of the many ways in which devices can fail (its possible failure modes) listed in <u>Section 1.4</u> above, there are two kinds that can be described as PM-related though only one kind (MR1 failures) that is PM-preventable:
::1. <u>[[Category MR1 (wear out) failure]]s</u> that have caused the device to stop working completely. These are failures that are caused by a [[non-durable part]] not receiving timely, competent restoration. ::1. <u>[[Category MR1 (wear out) failure]]s</u> that have caused the device to stop working completely. These are failures that are caused by a [[non-durable part]] not receiving timely, competent restoration.
-::2. <u>[[PM Code F (hidden) failure]]s</u> resulting from imperceptible failures of components within the device that do not cause the device to stop working completely but which have reduced the device's performance or safety below a critical level. These are failures that are discovered when performance and safety testing tasks are performed during PMs and <font color = red>although this testing does not totally preclude the possibility that a patient will be exposed to the device while it is in a defective state</font>, the discovery and correction of these hidden failures does shorten the period during which patients are exposed to the failure. This benefit is addressed more completely in <u>Sections 6.3 and 6.4</u> in [[HTM ComDoc 6.]]+::2. <u>[[PM Code F (hidden) failure]]s</u> resulting from imperceptible failures of components within the device that do not cause the device to stop working completely but which have reduced the device's performance or safety below a critical level. These are failures that are discovered when safety verification (SV) tasks are performed during PMs and <font color = red>although this testing does not totally preclude the possibility that a patient will be exposed to the device while it is in a defective state</font>, the discovery and correction of these hidden failures does shorten the period during which patients are exposed to this potentially hazardous condition. This benefit is addressed more completely in <u>Sections 6.3 and 6.4</u> in [[HTM ComDoc 6.]]
<br> <br>
-===1.10 <font color = red>The five basic questions about PM</font>===+===1.10 <font color = red>'''The five basic questions at the heart of the great PM debate'''</font>===
<br> <br>
-The foregoing analysis puts us in a position to answer the first of the five basic questions about PM - some of which have been addressed previously in [[HTM ComDoc 15.]]+The foregoing analysis puts us in a position to answer the first of the five basic questions about PM - some of which have been addressed previously in [[HTM ComDoc 15]]
<br> <br>
Line 227: Line 229:
*'''Improving patient safety'''. … Some devices <font color =red>- but only some</font> - can be made safer (but only a little safer) by performing appropriate PM. Not all failures have the potential to cause a serious injury, and not all failures are PM-preventable. *'''Improving patient safety'''. … Some devices <font color =red>- but only some</font> - can be made safer (but only a little safer) by performing appropriate PM. Not all failures have the potential to cause a serious injury, and not all failures are PM-preventable.
-*'''Regulatory compliance'''. … As we explain more fully in [[HTM ComDoc 11.]] the CMS regulation addressing PM for medical devices has traditionally been that all medical devices must be maintained strictly according to the device manufacturers' recommendations. Even after the regulations were changed in 2013 there is still a requirement that certain devices be subjected to periodic PM. (For more on this see [[HTM ComDoc 16]]).+*'''Regulatory compliance'''. … As we explain more fully in [[HTM ComDoc 11]] the CMS regulation addressing PM for medical devices has traditionally been that all medical devices must be maintained strictly according to the device manufacturers' recommendations. Even after the regulations were changed in 2013 there is still a requirement that certain devices be subjected to periodic PM. (For more on this see [[HTM ComDoc 16]]).
*'''Better business economics'''. … As we explain more fully in [[HTM ComDoc 9.]] some devices <font color =red>- but only some</font> - are made less costly to maintain by performing appropriate PM *'''Better business economics'''. … As we explain more fully in [[HTM ComDoc 9.]] some devices <font color =red>- but only some</font> - are made less costly to maintain by performing appropriate PM
Line 242: Line 244:
All equipment critical to patient health and safety</font> and any new equipment until a sufficient amount of maintenance history has been acquired."'' The "risk-based" option that TJC had been allowing was effectively rescinded. The revised CMS requirement specifically stated that for what they were now calling equipment critical to patient health and safety ''" <font color = red>Alternative equipment maintenance (AEM) methods are not permitted</font>."'' However, there was no clear indication of which particular devices they intended to target with this definition of ''"critical."'' They seemed to be placing the responsibility for this onto the facility by stating that the ''"... hospital may adjust its maintenance, inspection, and testing frequency and activities for facility and medical equipment from what is recommended by the manufacturer, <font color = red>based on a risk‐based assessment by qualified personnel</font>"''. All equipment critical to patient health and safety</font> and any new equipment until a sufficient amount of maintenance history has been acquired."'' The "risk-based" option that TJC had been allowing was effectively rescinded. The revised CMS requirement specifically stated that for what they were now calling equipment critical to patient health and safety ''" <font color = red>Alternative equipment maintenance (AEM) methods are not permitted</font>."'' However, there was no clear indication of which particular devices they intended to target with this definition of ''"critical."'' They seemed to be placing the responsibility for this onto the facility by stating that the ''"... hospital may adjust its maintenance, inspection, and testing frequency and activities for facility and medical equipment from what is recommended by the manufacturer, <font color = red>based on a risk‐based assessment by qualified personnel</font>"''.
-Faced with some push-back from members of the HTM community CMS issued a "clarification" memo in 2013 ([[HTM ComRef 28]]) in which they tried to address the uncertainty about the precise meaning of the phrase ''"equipment critical to patient health and safety"''. The key language in the 2013 memo is quoted in <u>Section 11.3</u> of [[HTM ComDoc 11.]] Suffice it to say that this new language does not clarify sufficiently what the agency intends by the term "critical" and the Task Force's interpretation of their intention is described in <u>Section 11.4</u> of [[HTM ComDoc 11.]] The new regulatory language does however introduce a major concession by allowing devices that are not considered to be "critical" to be included in an <font color = red>Alternative Equipment Management (AEM) program</font> where they can be maintained other than as the manufacturer recommends. As reported also in [[HTM ComDoc 11.]], the Task Force summarizes its conclusions about the agency's intention in the form of the following two recommended AEM program inclusion criteria. +Faced with some push-back from members of the HTM community CMS issued a "clarification" memo in 2013 ([[HTM ComRef 28]]) in which they tried to address the uncertainty about the precise meaning of the phrase ''"equipment critical to patient health and safety"''. The key language in the 2013 memo is quoted in <u>Section 11.3</u> of [[HTM ComDoc 11]] Suffice it to say that this new language does not clarify sufficiently what the agency intends by the term "critical" and the Task Force's interpretation of their intention is described in <u>Section 11.4</u> of [[HTM ComDoc 11]] The new regulatory language does however introduce a major concession by allowing devices that are not considered to be "critical" to be included in an <font color = red>Alternative Equipment Management (AEM) program</font> where they can be maintained other than as the manufacturer recommends. As reported also in [[HTM ComDoc 11]], the Task Force summarizes its conclusions about the agency's intention in the form of the following two recommended AEM program inclusion criteria.
'''Recommended AEM Program <font color = red>Inclusion Criteria</font>''' '''Recommended AEM Program <font color = red>Inclusion Criteria</font>'''
Line 253: Line 255:
Identification of the four specific categories of devices that cannot currently be included can be found by consulting [[HTM ComRef 33]]. Identification of the four specific categories of devices that cannot currently be included can be found by consulting [[HTM ComRef 33]].
-The Task Force's suggestions for implementing an efficient risk-based AEM program that will be compliant with these two criteria are contained in a recently-published two-part article in AAMI"s BI&T journal ([[HTM ComRef 35]] and [[HTM ComRef 36]]). Much of that material is also contained in [[HTM ComDoc 16]] ''"Implementing a simple RCM-based Alternate Equipment Management (AEM) program."''+The Task Force's suggestions for implementing an efficient risk-based AEM program that will be compliant with these two criteria are contained in a recently-published two-part article in AAMI"s BI&T journal ([[HTM ComRef 35]] and [[HTM ComRef 36]]). Much of that material is also contained in [[HTM ComDoc 16]] ''"Implementing a simple CMS-compliant Alternate Equipment Management (AEM) program."''
<br> <br>
Line 260: Line 262:
==='''1.10.3 <u>Question 3</u>. How to maximize the efficiency of a planned maintenance (PM) program=== ==='''1.10.3 <u>Question 3</u>. How to maximize the efficiency of a planned maintenance (PM) program===
<br> <br>
-[[HTM ComDoc 10.]] ''"Alternate Maintenance Strategies and Maintenance Program Optimization"'' identifies the following four maintenance strategies that are relevant to maintaining medical devices. +[[HTM ComDoc 10]] ''"Alternate Maintenance Strategies and Maintenance Program Optimization"'' identifies the following four maintenance strategies that are relevant to maintaining medical devices.
#Traditional fixed interval preventive maintenance (often combined with #3, periodic safety verification) #Traditional fixed interval preventive maintenance (often combined with #3, periodic safety verification)
#Predictive maintenance #Predictive maintenance
Line 268: Line 270:
The least efficient maintenance strategy in terms of using up scarce technical manpower is (#1) the traditional fixed interval preventive maintenance strategy. Predictive maintenance (#2) is the next least efficient. It differs from strategy #1 primarily in effectively extending the interval between restorations or replacement of the device's non-durable parts by substituting a visual inspection for the original restoration task. The most efficient strategy is, of course, the light maintenance strategy (#4). The periodic safety verification strategy is neutral with respect to efficiency because it must be performed on all devices that have a potential high severity (LOS 3) outcome to a hidden failure. It may also be considered prudent to perform periodic safety verification on all devices that are projected to have a less severe potential (LOS 2) outcome to a hidden failure. The least efficient maintenance strategy in terms of using up scarce technical manpower is (#1) the traditional fixed interval preventive maintenance strategy. Predictive maintenance (#2) is the next least efficient. It differs from strategy #1 primarily in effectively extending the interval between restorations or replacement of the device's non-durable parts by substituting a visual inspection for the original restoration task. The most efficient strategy is, of course, the light maintenance strategy (#4). The periodic safety verification strategy is neutral with respect to efficiency because it must be performed on all devices that have a potential high severity (LOS 3) outcome to a hidden failure. It may also be considered prudent to perform periodic safety verification on all devices that are projected to have a less severe potential (LOS 2) outcome to a hidden failure.
-Starting with the least efficient situation - a program in which PM is currently being performed on all of the facility's equipment according to the manufacturer's recommendations - +Starting with the least efficient situation - a program in which PM is currently being performed on all of the facility's equipment according to the manufacturer's recommendations - implement the following steps:
-*<u>'''Step 1'''</u> Identify which devices can be classified as <font color = red>non-critical devices</font> (see <u>Section 3.8</u> in [[HTM ComDoc 3]]), and to change these immediately to a <font color = red>run-to-failure maintenance method</font> (i.e. perform no scheduled PM).+*<u>'''Step 1'''</u> Identify which devices can be classified as <font color = red>non-critical devices</font> (see <u>Section 3.8.1</u> in [[HTM ComDoc 3]]), and change these immediately to a <font color = red>run-to-failure maintenance method</font> (i.e. perform no scheduled PM).
-*<u>'''Step 2'''</u> Determine the [[potential PM priority level]]s of the devices in the facility's medical equipment inventory by consulting <u>[[media:Figure_16.1.pdf|AEM eligibility based on outcome severity of failure]]</u> graphic (see [[HTM ComDoc 3]]).+*<u>'''Step 2'''</u> Determine the [[potential PM priority level]]s of the devices in the facility's medical equipment inventory by consulting the <u>[[media:Figure_16.1.pdf|AEM eligibility based on outcome severity of failure]]</u> graphic (see [[HTM ComDoc 3]]).
*<u>'''Step 3'''</u> Look over the recommendations below that are taken from <u>Section 4.10</u> of [[HTM ComDoc 4]] and [[HTM ComRef 36]]. Then make the changes that you feel comfortable with (see also .... and [[HTM ComRef 35]]). *<u>'''Step 3'''</u> Look over the recommendations below that are taken from <u>Section 4.10</u> of [[HTM ComDoc 4]] and [[HTM ComRef 36]]. Then make the changes that you feel comfortable with (see also .... and [[HTM ComRef 35]]).
Line 343: Line 345:
...................................................................................................................................................................................................................................................... ......................................................................................................................................................................................................................................................
-==='''1.10.5 <u>Question 5</u>. What changes to current PM work practices would be beneficial?===+==='''1.10.5 <u>Question 5</u>. What changes to current PM work practices would be most beneficial?===
<br> <br>
Line 373: Line 375:
Also, as was noted in <u>Section 4.9</u> of [[HTM ComDoc 4]], the summary proof tables ([[Table 5]]) are the most valuable part of the community database. In <u>Section 1.9.4</u>, above, we described how the statistics in [[Table 5]] can be used to identify the most common causes of equipment failures. Also, as was noted in <u>Section 4.9</u> of [[HTM ComDoc 4]], the summary proof tables ([[Table 5]]) are the most valuable part of the community database. In <u>Section 1.9.4</u>, above, we described how the statistics in [[Table 5]] can be used to identify the most common causes of equipment failures.
-====1.10.5.2 Data-based evidence for which PM intervals are optimum for different samples of PM-critical equipment and what levels of PM-related reliability are achieved at those intervals====+====1.10.5.2 The hard evidence showing which PM intervals are optimum for different kinds of PM-critical equipment and what levels of PM-related reliability are achieved at those intervals====
<br> <br>
As we describe in <u>Section 4.7</u> of [[HTM ComDoc 4]], adopting a coding system for PM findings similar to that described in that section and systematically documenting these findings each time a PM is performed, then aggregating that data, will make it possible to obtain two very important pieces of information: As we describe in <u>Section 4.7</u> of [[HTM ComDoc 4]], adopting a coding system for PM findings similar to that described in that section and systematically documenting these findings each time a PM is performed, then aggregating that data, will make it possible to obtain two very important pieces of information:
Line 390: Line 392:
-<font color = red>End of revised material ................................................................................................................................................................................................................................................</font>+* Back to [[Main Page]] or on to [[HTM ComDoc 2]] ''"Important definitions"'', (old page archived at [[HTM ComDoc 1.]])
- +
-......................................................................................................................................................................................................................................................+
- +
- +
- +
-* Back to [[Main Page]] or on to [[HTM ComDoc 2]] ''"Important definitions"''+
- +
- +
-From [[HTM ComDoc 7.]]+
- +
-The maintenance entity must use some form of coding for repair calls that allows for a separate count of the failures that are attributable to inadequate PM (similar to the MR 1 described in HTM ComDoc ?). Because of its value in maximizing total equipment safety, we also recommend a coding of at least the three basic causes of total failure described in [[HTM ComDoc 1]]- namely '''IRFs''' or inherent reliability-related failures; '''MRFs''' or maintenance-related failures; and '''PRFs''' or process-related failures. Adopting the full 15 category classification and coding method described in [[HTM ComDoc 1]] and [[HTM ComDoc 8.]] is highly desirable because of its value in diagnosing possible non-maintenance remedial actions.+
- +
-====1.8.2 A new approach to PM prioritization using RCM-based risk criteria.====+
- +
-The material in Sections 1.3 and 1.4 (above) provides the logical foundation for this new risk assessment method, which we are calling <font color = red>PM prioritization</font>. This logic can be summed up as follows. There are <font color=red>two ways</font> in which a [[PM-related failure]] of a medical device can put the safety of a patient or device user at risk:+
-*Some (life-supporting) devices, on which the the patient's life may be totally dependent, can stop working completely if they are not given some kind of periodic restoration during periodic planned maintenance activities; and +
-*Some devices can deteriorate in such a way that their performance or level of safety falls to such a degree that the device is potentially hazardous to the patient or user (these are called [[hidden failure]]s because this deterioration is often not obvious to the user). These hazards are detected and corrected during periodic planned maintenance. +
- +
- +
-To maximize patient safety it is important to ensure that all devices whose failure can put the safety of the patient at risk receive appropriate attention. Restoring or replacing a device’s [[non-durable parts]] in a timely manner (using what we call [[device restoration]] or [[DR]] tasks) will reduce the device’s overall failure rate to some degree (but certainly not to zero). And periodic [[safety verification]] or [[SV]] tasks will uncover any potentially hazardous hidden failures, hopefully before they can cause a patient injury. +
- +
-Based on certain combinations of these five risk criteria we are proposing a new approach to determining which medical devices are most likely to be potentially hazardous if they are not given periodic attention. These are the devices that should be given an appropriate level of priority for periodic planned maintenance. The term <font color = red>RCM-based risk criteria</font> is appropriate because the logical basis for this questionnaire is the same logical basis as that embedded in the RCM approach.( See [[HTM ComRef 1.]] and [[HTM ComRef 26.]])+
- +
-It is important to point out here that not all possible [[hidden failure]]s are listed in column 5 of [[Table 3]]. In many cases there may be a number of possible hidden failures and the best way of identifying them is to review the test protocols listed in the performance verification and safety testing (PVST) section of the device's generic PM procedure. For example, by looking at this section of the generic PM procedure for a defibrillator-monitor (click on the PM Code in the 3rd column of Table 3 - [[DEF-01]] you can see that Tasks (S4 thru S7) have been labelled as <font color=red>"Serious failure is potentially Life-threatening"</font>. The example cited in the fifth column of [[Table 3]] is that the ""hidden failure caused the unit to under-deliver"" which would correspond to a PM finding that Task S7 indicated that the delivered energy was significantly less than what the energy level selected. According to the extent to which the device is found to be out-of-spec (OOS) the adverse outcome should be judged to be of either LOS 1, LOS 2, or LOS 3 level of severity. In both of these cases (an anticipated overt failure or a hidden failure) the analyses in the tables should include this additional judgment on the outcome and worst case [[level of severity]] of each anticipated failure, entered in the sixth or seventh column of the respective table.<br>+
- +
- +
-[[Table 2]] and [[Table 3]] illustrate how this concise risk characterization process works. We have used the compounded result of these risk assessments to filter and categorize the subset of the 70+ more complex device types (listed in [[Table 1]] ) that we believe represent all or most of the device types that are likely to meet either of the Task Force's first three risk criteria. Although this particular subset represents only about 5-10% of the 700 to 1500 different types of medical equipment in modern hospitals, we believe that it represents all of the types of device that are likely to injure a patient, either if they stop working completely or if they develop some kind of significant hidden degradation. +
- +
-The concise scenarios described in the fifth and sixth columns of [[Table 2]] and [[Table 3]] make the categorization process logical and quite transparent since the judgments are made public on the Task Force's wiki website and they are there to be challenged. This new process should allow for a much better community-wide consistency than the broad, potentially subjective generalizations of earlier methods. The new method introduces one or two new terms to characterize more precisely the nature of the device types that should be considered potentially hazardous, but these new terms are helpful identify which preventive strategies, including non-maintenance measures, will work best for maximizing patient safety (see [[HTM ComDoc 8.]] )+
- +
-The Task Force has prepared a brief statement documenting the [[why this PM Criticality questionnaire is consistent with established industry standards of practice]].<br>+
- +
-====[[Non-critical device]]s====+
- +
-As best we can estimate there are, in round numbers, between 750 and 1500 different types of healthcare-related devices in use in today’s healthcare facilities. An unknown number of these are non-clinical devices such as printers or other device accessories that do not even fall into the formal category of a medical device that is regulated by the FDA. These non-clinical devices are extremely unlikely to be PM-critical. At the other end of the scale there is a group of about 70 device types that are more likely to be [[PM-critical]], either because of their complexity, or for some other reason that was captured in the original Fennigkoh-Smith criteria. +
- +
-The Task Force believes that a large percentage of the estimated remaining balance of at least 700 device types will prove to be non-critical when they are analyzed. One example is a set of patient scales. When the HTMC generic PM procedure for a set of patient scales ([[PA.SC-01]])is analyzed using the questionnaire process described in section 3.3 of [[HTM ComDoc 3.]], responses (1), (2) and (6) are all “no”, and so - according to our criteria - a set of patient scales should be classified as a non-critical device. +
- +
-Based on the preliminary findings shown in [[Table 2.]] and [[Table 3.]] we believe that a large number of device types can be shown to be [[non-critical]]. This is a very important step because it provides a very solid, rational argument for why a very large number of medical devices can be used quite safely without any kind of periodic PM whatsoever. They simply have no high-severity, PM-preventable failure modes and so, by definition, they are non-critical. The evidence for this is that there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent the device, if it could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm that had already developed. +
- +
-This leaves a list of about 70 device types, shown in [[Table 4.]], that are potentially PM-critical. However, as we will show in Part 2 of this article, by implementing Step 2 of this new risk analysis, which will draw on aggregated maintenance data from the new community-wide database, we will be able to determine which of these devices should actually be designated as PM-critical (high risk) devices and given periodic PM according to the manufacturer’s recommendations. The others are all more reliable, lower risk devices. We anticipate that, when fully implemented, the analysis in Step 2 will reveal devices with risk levels distributed across the full spectrum from high-risk to very low risk devices.+
- +
-====All potentially PM-critical devices are not necessarily high-risk devices!==== +
- +
-Just having one or more critical PM-related failure modes is not sufficient to make a device classifiable as a potentially unsafe "high-risk" device. According to modern reliability and risk management theory ([[HTM ComRef 1.]], [[HTM ComRef 2.]]), "risk" has two components: +
-# The severity of the outcome of the event (in this context a PM-preventable device failure); and +
-# The likelihood that the event (the PM-preventable device failure) will actually occur. +
- +
-This required combination of two factors means that devices that have a manufacturer-recommended PM procedure with critical device restoration tasks or safety testing tasks will not necessarily become hazardous just because the manufacturer's procedure is not followed or even utilized at all. If the likelihood of any PM-related failures actually occurring (even if they are critical failures with high-severity outcomes) is very low - with a mean time between failures (MTBFs) of, say, 50-75 years or more - then the corresponding risk of harming the patient is reduced from high to moderate, to low, or even to very low. The actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation. This is why traveling on a commercial airliner is considered to be safe. While there is a theoretical potential for a high-severity outcome if the plane should crash, the likelihood that this will actually happen is very low – so the level of risk when flying on a commercial airliner is also very low, relative to other ways of traveling. +
- +
-In order to determine which devices have the theoretical potential to cause a patient injury (or some less severe adverse outcome) if the device should fail because its PM was not completed in a timely manner - we first need to be clear about what is achieved by performing the various tasks listed in the manufacturer’s recommended PM procedure.+
- +
-In general, there are two kinds tasks contained in a medical device’s PM procedure. The first kind is a task that restores the device to something close to its original, like-new condition. The Maintenance Practices Task Force calls these [[device restoration]] tasks. They are tasks in which components that are subject to deterioration during the useful lifetime of the device, such as batteries, cables, fasteners, gaskets and tubing, are periodically refurbished or replaced. The second kind is some sort of test to detect any hidden degradations in the functional performance or safety of the device that are sufficiently hazardous to require immediate correction. The Task Force calls these [[safety testing]] tasks.+
- +
-It is entirely possible for some manufacturer-model versions of any of the PM-critical device types listed in [[Table 2.]] and [[Table 3.]] to be classified as low-risk devices '''if they can be shown to have good reliability''' (a demonstrated low probability of failing). [[Table 12.]] shows the Task Force's tentative definitions of what should be considered acceptable levels of reliability. We will discuss this in more detail in section 3.3 of [[HTM ComDoc 4.]]+
- +
-(There are a very large number of medical devices that can be used quite safely without any kind of periodic PM whatsoever because they have no high or moderate-severity, PM-preventable failure modes. These devices are, by definition, non-critical. The evidence for this is that <u>either</u> there are simply no tasks listed in the relevant manufacturer’s PM procedure that would either prevent a device that could cause harm if it failed, from failing - or that would detect a hidden failure that could cause harm if it had already developed; <u>or</u> there are no possible high or moderate severity outcomes from either total failure or serious degradations.) +
- +
-(Device types that have no potential whatsoever to cause any kind of patient injury or any other significant adverse outcome '''when they fail''', either completely, or by developing a hidden failure - such as a phototherapy light - will be classified as non-critical. And since, by definition, non-critical devices have no significant adverse outcome if they fail, they will all be automatically categorized as <font color = red>inherently safe devices</font>.) +
- +
-In summary, all non-critical device types (i.e. those that have no critical PM-related failure modes) are, by definition, inherently safe with respect to needing PM. Whereas, all PM-critical device types are potentially <font color = red>high-risk (potentially hazardous)</font> devices unless certain manufacturer-model versions of those device types can be shown to have good reliability (i.e. a low likelihood that the PM-related failures will actually occur), in which case they can be categorized as <font color = red>lower risk</font> devices. See [[Table 12.]] for a more details on the tentative definitions of the various levels of device risk.+
- +
-We will describe <u>how to determine which devices are [[PM-critical/high risk]] devices</u> in section 4.x of [[HTM ComDoc 4.]] +
- +
-So, if the total failure or critical degradation of the device is highly unlikely to occur, the level of risk associated with using the device is correspondingly small. Devices that are classified in the tables as having potentially life-threatening severity (LOS 3) outcomes from total failure or from critical degradation should more properly be called '''potentially hazardous''' or '''potentially high-risk''' devices because the actual level of risk at each of the three levels of severity is, in fact, accurately represented by the probability that the device will actually fail, either totally, or by developing some significant degradation.+
-<br>+
- +
-===<font color =red>1.12 Progress Report</font> (November 2017)=== +
- +
-To quote from the third paragraph of the statement titled “Background” on the Introductory Materials page of the website created by the Maintenance Practices Task Force (MPTF), one of the primary motivations prompting this project, which AAMI began supporting in November 2015, is to address the huge problem created by the failure of the Healthcare Technology Management (HTM) community to establish a “… generally-agreed way of quantifying current levels of maintenance-related medical equipment safety …”. +
- +
-Much has been written about medical technology, and virtually all of it states that the ultimate, overriding consideration must always be assuring the very highest levels of patient safety. Maximizing patient safety is, of course, a very worthy goal - with which there can be no quarrel - but to paraphrase one of the better maxims of the business world – if you can’t measure it, you can’t manage it. And since virtually all of the regulations and standards governing the HTM business include a requirement, either direct or indirect, to provide levels of patient safety that are “generally acceptable”, this current lack of an accepted metric for medical device safety – and maintenance-related medical equipment safety in particular - makes it impossible to prove how well (or not) we are satisfying this important obligation. This same lack of the proper tools also makes it very difficult to compare the levels of maintenance-related medical equipment safety achieved by different maintenance strategies. +
- +
-A current manifestation of this quandary is the requirement in the recently amended medical equipment maintenance regulations of The Centers for Medicare & Medicaid Services (CMS) which implies very strongly that the use of any of the now-permitted alternate equipment management (AEM) strategies for maintaining the facility’s medical equipment must keep the equipment just as safe as it would be if the devices were being maintained according to the manufacturer’s recommendations. This is clearly a very reasonable requirement but it is creating practical difficulties for facilities trying to introduce more cost-effective maintenance practices, as well as for the various survey and inspection teams who are responsible for confirming that maintenance practices other than those recommended by the device manufacturer are not exposing patients to higher levels of risk. +
- +
-Everyone familiar with the standard texts on risk management knows that safety itself is not directly measurable (see, for example, the third chapter in “ Of Acceptable Risk: Science and the Determination of Safety “ by William Lowrance). The only aspect of safety that is measurable is the actual level of risk created by some specified potential hazard. So when we say that something such as a medical device is safe, what we are really doing is making a judgment relative to some recognized standard that the risk created by one or more particular potential hazards (such as, in this case, the potential for an adverse patient outcome attributable to inadequate device maintenance) is generally acceptable. Devices that are deemed “safe” in this way are really only safe with respect to the specifically identified hazard, or hazards. +
- +
-While all of the various participants in the HTM business - including the regulating authorities - have cited patient safety as the primary driver within their respective areas of responsibility, there has been a lack of meaningful efforts to establish a rational, scientific basis for making these judgment calls on the level of safety of the patient. This is certainly true of the regulatory framework that is intended to ensure the safety of medical devices in their working lifetime, subsequent to the device having passed through the FDA ‘s initial device approval process. It has already been pointed out in the just-published AEM Program Guide that some of the accreditation standards based on the CMS regulation (referenced above) contain sloppily incorrect or inconsistent terminology as well as a complete lack of direction on how conformance to what are allegedly the “generally acceptable” levels of patient risk should be demonstrated.+
- +
-By adopting the widely used and very well respected scientific methodology embedded in reliability-centered maintenance (RCM), the Maintenance Practices Task Force (name shortened elsewhere in this report to “the Task Force”, “the MPTF” or just “the TF”) has made significant progress towards solving this fundamental problem. As described in HTM ComDoc 1 and several other related documents on the website, the Task Force has created a useful method for characterizing the level of the PM-related risk associated with the different manufacturer-model versions of the most PM-critical medical devices. Each of the identified levels of maintenance-related risk are combinations of two parameters; one representing an assessment of the worst-case level of severity of the adverse outcome of a PM-preventable failure of the device (the TF has selected three representative levels - either a life-threatening injury, a serious but less than life-threatening injury, or a less serious outcome such as a delayed diagnosis or delayed treatment) and a second parameter quantifying the likelihood of a PM-preventable failure actually occurring (represented by the device’s documented PM-related failure rate). +
- +
-The Task Force has also proposed a practical method for establishing what level of PM-related risk should be considered acceptable – another notable step forward. In this particular context it seems logical to set the standard for acceptable maintenance-related safety at the typical level of PM-related risk achieved when the devices in question are maintained strictly according to the manufacturer’s recommendations. Just what this level is, can and will be determined (see project Objectives # 3 & 4) by conducting a statistically satisfactory number of tests to determine and document the actual PM-related failure rates demonstrated by a sample drawn from a number of the potentially most critical devices during a time when they are being maintained according to their manufacturer’s recommendations.+
- +
-====Patient safety as it relates to the maintenance of medical devices====+
- +
-Much has been written about medical technology and virtually all that is written cites maximizing patient safety as the ultimate, overriding consideration. This is, of course, a very worthy goal with which there can be no quarrel; it is the motherhood and apple pie of healthcare technology management (HTM) and a cherished icon that we all serve dutifully and enthusiastically. In addition to this, virtually all of the regulations and standards governing the HTM business include either a direct or indirect obligation to provide acceptable levels of patient safety. The rub comes however when we attempt to quantify how well our efforts are measuring up to this rather vague obligation to maximize patient safety. +
- +
-A recent piece by …. on the debate over medical device service urging …. is an good example. +
- +
-Safety itself is not measurable. The only aspect of safety that is measurable is the actual level of risk created by some specified potential hazard. So when we say something is safe, what we are really doing is making a judgment that the level of risk posed by one particular potential hazard is considered to be acceptable. The device is indeed safe but only with respect to this one particular hazard (cite Lowrance).+
- +
-To illustrate this we will use an example from recent investigations (cite ?) into alternative equipment management (AEM) strategies that would make medical devices just as safe as they would be if the device were being maintained according to the manufacturer’s recommendations – something now permitted by recent revisions to the regulations of the Centers for Medicare & Medicaid Services (CMS) relating to medical equipment (cite ?). In this example the risk that we are concerned with is the risk that the device will fail from a PM-preventable cause.+
- +
-PM-preventable failures.+
- +
-The key to identifying which device failures can be attributed to a PM-preventable cause (could have been prevented by a more effective or more timely PM activity) is to examine each of the tasks listed in the manufacturer’s PM procedure. This will identify which of the device’s components needs some kind of periodic restoration such as a filter that needs cleaning or a battery that needs to be replaced. If a device is presented for repair and the only thing wrong with it can be traced a component that is scheduled for some kind of restoration during PM, then it is quite likely that this failure can be considered to be a PM-preventable failure. Maybe the restoration performed during the last PM was ineffective or maybe the PM interval is too long. Similarly, the manufacturer’s PM procedure may include testing the performance of the device to detect deteriorations in either its functional performance or in its compliance with certain safety requirements that would not be obvious to the user – so-called hidden failures. While these deteriorations have not caused a complete failure the diminished performance could be putting the patient at risk and these should be considered to be PM-preventable failures. A shorter PM interval would have reduced the length of time that the patient was exposed to some level of risk.+
- +
-In order to gather reliable information on the frequency with which PM-preventable failures are encountered it is very important to standardize the techniques and criteria for diagnosing when a user-reported failure is legitimately attributable to inadequate or tardy PM. Similarly, we need to standardize the techniques and criteria for diagnosing failures encountered when the actual PM is performed. Obviously a PM finding that the device failed one or more of any critical performance or safety tests included in the PM procedure constitutes a PM-preventable failure (It is an indicator that the PM interval is too short). And the Maintenance Practices Task Force has proposed that discovering a part that was scheduled for some kind of restoration during the PM has already deteriorated to the point where it could have been interfering with the proper operation of the device is also considered to be a PM-preventable failure. This is also an indicator that the PM interval is too short.+
- +
-Unfortunately there is still some considerable variation in the kind of maintenance data collected throughout the field. While there have been recommendations for standardizing on these particular indicators that the failure was PM-preventable, they are not yet in widespread use. +
- +
- +
-So even though we are often required to characterize something such as a maintenance practice or maintenance “strategy” as safe or unsafe we generally fail to address the judgment call nature of this requirement. Although we champion data driven decisions – and this is an important and laudable step forward - we need to recognize that with respect to safety there are generally no prescribed boundaries separating acceptable (i.e. safe) levels of risk from unacceptable (i.e. unsafe) levels of risk. +
- +
-The data driving the decisions are the levels of risk relevant to certain specific hazards.+
- +
- +
- +
- +
- +
- +
- +
-===<u>In summary - on Question 2 - <font color =red>How much of an impact can PM have on the safety of medical devices?</font></u>===+
-<br>+
- +
-While the device’s '''overall reliability''', which corresponds directly to the total number of the repair calls - irrespective of what caused them – determines the device's '''effective reliability''', it is the numbers of maintenance-related failures ('''MRFs''') and inherent reliability-related failures ('''IRFs''') that are of greatest interest to us, as maintainers, at this time. The level of MRFs provides a good measure of the effectiveness of the facility’s maintenance program, and the level of IRFs provides an equally good measure of the basic or inherent reliability of the devices in question.+
-<br>+
- +
- +
- +
- +
- +
-* Back to [[Main Page]] or on to [[HTM ComDoc 2]] ''"Important definitions"''+

Current revision

Contents

Start here: PM Basics, key concepts and terminology

(This document was last revised on 12-8-18)

1.1 Device failures and measures of reliability

A device or equipment system is considered to have failed when:

  • it no longer performs the function or functions that the user wants it to perform (these are called overt failures), or
  • when it functions as it should, but in an unsafe or otherwise unsatisfactory manner (these are called hidden failures).

It is a truism, similar to the impossibility embedded in the concept of perpetual motion, that there is no such a thing as an infallible device. All devices fail in one way or other, at some time or other. The simplest measure of a device’s reliability is its failure rate - the number of times that it failed to perform, or failed to perform satisfactorily, during a particular time period. Since failures are predominantly random, a device's failure performance (it's reliability) is usually expressed as an average number of failures over a particular time period. However, a more intuitive way of expressing device reliability is in the form the device's mean time between failures or MTBF which is the inverse of it's failure rate over a particular period of time. For example; a device that has demonstrated an average failure rate of one failure every 75 years is demonstrating a mean time between failures of 75 years.

1.1.1 Expressing reliability as a failure rate or as a mean time between failures (MTBF)

Mean time between failures (MTBF) is the inverse of the failure rate. For example, a device that has failed twice in nine years is demonstrating a failure rate of 0.22 failures per year and an MTBF of 4.5 years. Average failure rates can also be derived by dividing the total number of device failures occurring during the observation period by the number of device-years making up the total device experience. For example, if a batch of 10 devices experiences two failures during nine years, then the failure rate is 0.022 failures per year and the MTBF is 45 years. The larger the experience base (in device-years), i.e. the greater the number of devices in the sample and the longer the observation period, the closer the observed failure rate will be to the device’s true failure rate.

It is generally easier for lay persons to relate to an MTBF because it is expressed in units of time periods, such as 3 years or 30 years - a simple, easily comprehended metric. For example, most people will have little difficulty in considering a device with an MTBF of just one month to have a relatively poor level of reliability and, conversely, considering a device with an MTBF of 50 years to be quite reliable. But when expressed as the equivalent failure rate, the MTBF of 1 month (= 12 failures per year) versus the MTBF of 50 years (= 0.02 failures per year) the contrast between the two levels of reliability (12 versus 0.02) does not seem quite so striking.

Since, ideally, we would like to separate various different kinds of devices into neat compartmentalized categories such “safe” and “hazardous” we have to confront the difficulty of setting boundaries and consequent gray areas around those boundaries. For example, setting a threshold of, say, 75 years for the MTBF that should be considered safe creates the hard-to-answer question of how much less reliable (and thus less safe) is a device with an MTBF of 74 years than one with an MTBF of 75 years? There is, of course, no simple answer to that question. There are grey areas. It is all relative.

This discussion is made a little more complicated by the fact that there are a several different reasons why devices fail, and lumping all of these failures for these different reasons into one overall failure rate, or corresponding MTPF, might well raise the question that this total failure rate does not seem to fairly describe what we think of as either the reliability of the device itself, or the effectiveness of the way we maintain it. Section 1.4 below addresses the nature of these different causes of failure and how they can be categorized and used to develop a helpful and meaningful analysis.

1.2 What is maintenance?

There are several adequate dictionary definitions of maintenance but, in the context of maintaining equipment, it is best defined as "the process of keeping the equipment in proper working order, in good physical condition and acceptably safe". The definition used in the highly respected RCM approach to equipment maintenance is “keeping the equipment available for use”. For more about RCM, see HTM ComDoc 14. "An introduction to Reliability-centered Maintenance (RCM): The modern approach to Planned Maintenance".

A traditional equipment maintenance program has three parts:

  1. Corrective maintenance or, as it is more commonly called, repair, is the process of returning a device that is in a failed state (i.e. that is no longer doing what the user wants it to do) to a safe condition and proper working order. This includes correcting any significant hidden failures even though they do not usually affect the primary functions of the device.
  2. Cosmetic repair, is the process of restoring a device that is damaged to a safe and cosmetically like-new condition. While cosmetic repairs are generally considered a lower priority because the device may still be functioning within the manufacturer’s functional specifications it may be damaged in such a way that it is unsafe. For example, a damaged cover may be presenting a sharp edge that could be hazardous to either the patient or to a user.
  3. Preventive maintenance. This third component is very important because from the very beginning, with the earliest machines developed during the time of the industrial revolution, it was widely believed that restoring the device's non-durable parts, as needed, before the end of the device's anticipated lifetime would be beneficial because it would reduce the number of unexpected machine breakdowns. In return for these scheduled PM interventions to restore the device's non-durable parts, the device users expect a lower level of the disruption and loss of productivity, as well as some reduction in overall maintenance costs, because the device should experience fewer breakdowns.

Non-durable parts (NDPs)- which are sometimes loosely called disposables or disposable parts - are components of the device that are subject to progressive wear or deterioration. They typically include moving parts, such as bearings, drive belts, pulleys, mechanical fasteners and cables, which require periodic cleaning and lubrication as well as certain non-moving parts such as electrical batteries, gaskets, flexible tubing and various kinds of filters which may need to be cleaned, adjusted, refurbished or replaced sometime during the useful lifetime of the device. Which particular parts the device manufacturer considers to be non-durables is identified by the presence of corresponding device restoration tasks in the manufacturer's recommended PM procedure.

As we describe more fully in HTM ComDoc 14. "An introduction to Reliability-centered Maintenance (RCM): The modern approach to Planned Maintenance" ............

Belief in this traditional device restoration approach to improving machine reliability continues to this day, particularly in certain relatively small industry sectors, even though the findings that started the revolutionary RCM approach to maintenance in the 1970s have caused a considerable amount of rethinking about whether or not intrusive maintenance interventions really do improve the device's overall reliability. Certainly there are still quite a number of medical devices such as ventilators, spirometers and traction machines that are more mechanical than electronic, where the manufacturers still recommend that certain parts be given some kind of periodic restoration (cleaning, refurbishment or replacement). However, we don’t yet have good, independent evidence as to whether or not these manufacturer-recommended PMs, particularly those involving the more intrusive overhauls, are truly beneficial or cost-effective. We have not yet gathered the data on the impact of these recommended interventions on the reliability of these more mechanical devices. That investigation is one of the goals that the Maintenance Practices Task Force (MPTF) has set for itself. We discuss this data gathering challenge in more detail in HTM ComDoc 4.

1.3 What exactly does the term "PM" mean in the context of medical equipment maintenance?

In the special case of maintaining medical equipment, there is a second very important reason besides device restoration for making periodic scheduled interventions. And that is testing the device to detect critical degradation in the functional performance of the device or in its condition with respect to safety. These deteriorations can be quite subtle, and in RCM jargon these degradations are called hidden failures. The term is appropriate because these subtle changes do not completely disable the device's primary functions and so they will usually go unnoticed by the device users.

It is important to detect these subtle deteriorations (hidden failures) because there are certain kinds of medical devices that can cause a patient injury if their performance becomes significantly substandard or their level of safety falls below the relevant requirements. Elsewhere (see HTM ComDoc 3) we characterize the types of devices that have a theoretical potential to injure a patient if they deteriorate in this way as hidden failure-critical or HF-critical devices. These devices need to be subjected to periodic safety verification tasks. Appropriate safety verification tasks for checking out each particular type of device are typically included as a part of the device manufacturer's recommended PM procedure.

Similarly we can characterize devices that have a theoretical potential to injure a patient, if they simply stop working, as life support devices (See Section 1.8.1 below). As the descriptor (life support) implies it is important to minimize failures of these devices. If these devices have manufacturer-designated non-durable parts (NDPs) they are vulnerable to what the Task Force calls wear out type failures and they need to be subjected to appropriate device restoration (DR) tasks to prevent the device from failing. This will eliminate one (but only one) source of device failures. So, a life support device that has manufacturer-designated non-durable parts is vulnerable to wear-out type failures. The test for this is whether or not the device manufacturer's recommended PM procedure includes any device restoration tasks.

One of the recurring obstacles in our discussions of PM over the years has been the use of a number of imprecise and inconsistent terms. Unfortunately there is still no general consensus. So, in an attempt to establish a standardized and more consistent PM terminology, we are proposing (below) some new terms.

We believe that it would be quite difficult to get the entire population of engineers and technicians practicing in the medical equipment maintenance field to change from using the long-established traditional diminutive “PM”. To accommodate this practical issue we are proposing to introduce another term with the same diminutive. The newer term, "planned maintenance" will be used to define the combination of the traditional device restoration tasks (what we have traditionally called “preventive maintenance”) and the safety-oriented performance/safety testing tasks that are more or less unique to the medical field. In this new formulation we are proposing to use the term “device restoration tasks" as a short label for the restoration of the device's non-durable parts. It is a simple and appropriately descriptive term.

We are suggesting this new terminology in full recognition of the fact that there are a number of other competing terms that have evolved over time. For example the term “scheduled maintenance” has been proposed as an alternative to “preventive maintenance” but it is not a very good fit semantically because it implies that the device restoration tasks are always performed according to some kind of clock; either by conventional timing (e.g. every 6 or 12 months) or by a time-of-use clock (e.g. every 1000 hours of use). There is, however, a more modern practice in which the deteriorating part is restored on a more efficient “just-in-time” basis by monitoring the actual condition of the part. In some cases the monitoring is performed by some kind of sensor but more commonly in the medical equipment sector it is simply done by conducting periodic visual inspections. In the RCM approach this “just-in-time” restoration is called predictive maintenance. In addition to this, what we are proposing to call safety verification (SV) tasks have been given the collective name “inspections” by ECRI Institute and others. We prefer the more descriptive term “safety verification” tasks.

So, in summary, in the context of medical equipment maintenance, the contraction “PM” should be understood to mean “planned maintenance” which is defined as a combination of two different types of tasks; one (device restoration tasks) aimed at preventing wear-out failures, and the other (safety verification tasks) aimed at detecting then correcting hidden failures; i.e.


Planned maintenance (PM) procedure = Device restoration (DR) tasks + Safety verification (SV) tasks


1.4 Classifying and coding the causes of (overt) medical device failures

There are a number of different reasons (causes) why equipment systems fail and it is particularly important to recognize that not all of these failures can be prevented by some kind of planned maintenance. Consider, for example, the following list of possible causes of device failure:

  • The first set of causes can be classified as inherent reliability-related failures (IRFs) that are attributable to the design and construction of the device itself, including the inherent reliability of the components used in the device. They typically represent 45 - 55% of the repair calls. This type of failure can be reduced (but not to zero) only by redesigning the device or changing the way it was constructed.

Category IR1 Random failure. A device failure caused by the random failure or malfunction of a component part of the device.. A result of the device’s inherent unreliability. IR1 calls typically represent between 40-55% of all repair calls.

Category IR2 Poor construction. A device failure attributable to poor fabrication or assembly of the device itself..

Category IR3 Poor design. A device failure attributable to poor design of the hardware or processes required to operate the device..


  • The second set of causes can be classified as process-related failures (PRFs). They typically represent 40 - 50% of the repair calls. Reducing or eliminating these types of failure typically requires some kind of redesign of the system’s processes - for example, by using better methods to train the equipment users to operate the equipment (as intended by the manufacturer) or to train them to treat the equipment more carefully. They are not failures that can prevented by any kind of maintenance activities.

Category PR1 Use error. A device failure attributable to incorrect set-up or operation of the device by the user.. User has not set the device up correctly or does not know how to operate it. Typically PR1 calls represent between 13-20% of all repair calls. (Note that although this type of “failure” does not represent a complete loss of function, it can have the same effect. For example, an incorrectly set defibrillator can result in a failure to resuscitate the patient).

Category PR2 Physical damage. A device failure caused by subjecting the device to physical stress outside its design tolerances.. PR2 calls typically represent between 6-25% of all repair calls.

Category PR3 Discharged battery. A device failure attributable to a failure to recharge a rechargeable battery. PR3 calls typically represent between 7-8% of all repair calls.

Category PR4 Accessory problem. A device failure caused by the use of a wrong or defective accessory.. PR4 calls typically represent between 3-9% of all repair calls.

Category PR5 Environmental stress. A device failure caused by exposing the device to environmental stress outside its design tolerances.. PR5 calls typically represent between 1-7% of all repair calls.

Category PR6 Tampering). A device failure caused by human interference with an internal control.. PR6 calls typically represent <1% of all calls.

Category PR7 Network problem. A device system failure caused by an issue within a data network connected to the device’s output.


  • The third set of causes can be classified as maintenance-related failures (MRFs). They typically represent 2 - 4% of the repair calls. These types of failure can be prevented through some kind of maintenance strategy incorporated into the facility’s maintenance program.

Category MR1 PM-preventable failure. A device failure that could have been prevented by more timely restoration or replacement of a manufacturer-designated non-durable part. E.g. a battery failure, a clogged filter, or build up of dust. Failures due to trapped cables should not be coded this way. MR1 calls typically represent between 1-3% of all repair calls.

Category MR2 Poor set up. A device failure caused by poor or incomplete initial installation or set-up of the device.. MR2 calls typically represent between 1-3% of all repair calls.

Category MR3 Needed recalibration. A device failure attributable to improper periodic calibration. MR3 calls typically represent <1% of all repair calls.

Category MR4 Re-repair. A device failure attributable to a poor quality previous repair of the device.. MR4 calls typically represent <1% of all repair calls.

Category MR5 Intrusive PM. A device failure attributable to earlier intrusive maintenance.. MR5 calls typically represent much <1% of all repair calls.


1.4.1 Coding repair work orders

The Task Force recommends very strongly that all repair work orders be provided with a field for coding what is judged to be the primary reason (cause) why the device failed. As will be described later the statistics obtained from this coding is very useful for managing the various different failure prevention measures. The recommended format for this coding follows the classification arrangement described immediately above in Section 1.4. For example, a failure that is judged to have been caused by the device having been dropped would be coded as a PR2 failure, and a failure that is judged to have no obvious cause would be coded as an IR1 failure.

1.4.2 Different measures of device reliability

While the device’s overall reliability (which corresponds directly to the total number of the repair calls - irrespective of what caused them) determines the device's effective reliability, it is the numbers of maintenance-related failures (MRFs) and inherent reliability-related failures (IRFs) that are of greatest interest to us, as maintainers, at this time. The level of MRFs provides a good measure of the effectiveness of the facility’s maintenance program, and the level of IRFs provides an equally good measure of the basic or inherent reliability of the devices in question.

1.5 Which kinds of medical device failures can be hazardous?

There are four ways in which medical equipment failures can be hazardous. However, not all of those failures are PM-preventable failures.

  • If the device is damaged in such a way that it is presenting some kind of direct physical threat to the safety of patients or staff, such an exposed sharp edge.

For example, the case or enclosure of a piece of equipment might be damaged, say as a result of the item being dropped, in such a way that the damaged casing poses a risk of injury to the patient or user, even though the item still works. Or the protective outer layer of the device's electrical cord might be damaged so that it exposes a live conductor posing the risk of an electric shock. These could be hazardous to the patient, to the device user and possibly others. It is to be expected that damage such as this would be noticed and repaired at the time of its periodic maintenance - so, to the extent that this kind of damage occurs and goes unreported, periodic PM contributes to the levels of overall safety. These are not considered to be PM-preventable failures but periodic PM may shorten the time that individuals are exposed to these potentially hazardous outcomes. Situations such as this appear to be encountered quite rarely.

  • If the failure is a sudden, total failure.

There are a number of devices that are life-supporting in the sense that a sudden, total failure while they are in use could put the patient’s life at risk. Examples include critical care ventilators, anesthesia units, heart lung machines, intra-aortic balloon pumps, external pacemakers, defibrillators, AEDs, cardiac resuscitators, infant incubators, neonatal monitors, apnea monitors - and in some circumstances - patient monitors, oxygen monitors and pressure cycled ventilators. In addition to spontaneous random failures it is possible that a device could suddenly stop working if a part that is recommended for periodic restoration fails prematurely. This could also occur if the maintenance interval has been set too long. The failure of any device that is attributable to the failure of a critical part that requires timely restoration is considered to be a PM-preventable failure. However, situations such as this appear to be encountered quite rarely.

  • If the device develops some kind of hidden failure.

There are some devices that have the potential to cause a patient injury if their functional performance falls below a certain critical point in such a way that the deterioration is not obvious to the user. Examples include a defibrillator whose delivered output energy is significantly lower than the level set by the user; or an infusion device that delivers medication at a significantly lower or higher rate than that set by the user. Similarly there are some devices that have the potential to cause a patient injury if their compliance with a relevant safety specification falls below an acceptable point and this deterioration is not obvious to the user. Examples include; an open ground connection in a device that has exposed metal that could conceivably become "live", and a malfunction in devices that have critical alarms. While, strictly speaking, these failures are not totally prevented by periodic PM, the time that patients are exposed to these potentially hazardous outcomes is reduced. For more on this - see Section 4.7 of HTM ComDoc 4. Elsewhere - See Sections 6.3 and 6.4 of HTM ComDoc 6 - we have shown that the exposure of the patient to this possible hazard is reduced from 100% (as it would be with no PM) to a lesser percentage determined by the ratio of the frequency with which the PM testing is performed to the frequency with which the hidden failure occurs. With typical PM intervals in the range of 6 months to 5 years and mean time between failures of these random hidden failures in the range of 50 to 250 years, the reduction in exposure of the patient will be reduced by 95 - 99%. Hazardous hidden failures appear to be encountered quite infrequently.

  • If the device is used improperly.

Almost all medical devices have the potential to injure patients if they are used improperly. However, this is a type of failure that cannot be prevented or mitigated by conventional planned maintenance and they are not considered to be PM-preventable equipment failures. Accident statistics show that misuse of medical devices represent the most common reason for device-related patient injuries.

For more on this subject see HTM ComDoc 8 "Maximizing medical equipment-related reliability and safety"

1.6 Hidden failures

A hidden failure (HF) is said to have occurred when either:

  • the device is performing in a way that is significantly out of specification, but sufficiently similar to the performance that the user wants, that the failure is not immediately obvious to the user, or
  • the device is no longer in compliance with any safety specifications applicable to the device in question, but this deterioration is also not obvious to the user. These hidden failures are usually the result of imperceptible random failures in the device's components or subsystems. They are detected through performance or safety tests specified in the manufacturer's recommended maintenance procedure and made during the periodic PMs.

When this more subtle type of failure introduces a significant performance or safety degradation that can be detected only by some kind of performance or safety test it can constitute a serious safety threat. For example, a heart rate alarm that has malfunctioned so that it no longer goes off at the set limit will remain as a hidden but potentially hazardous failure until the alarm function is checked and the potentially dangerous degradation discovered. The potential level of severity of the outcome of hidden failures will depend on the nature of the failure and on how far the performance or safety flaw is out of specification. For example; a significant reduction in the output of a defibrillator has to be considered life-threatening but a small excess in the electrical leakage current of a laboratory centrifuge – while it should be noted in the test report - is unlikely to constitute a significant hazard, or be considered an imminent threat.

Hidden failures are discovered when the performance verification and safety testing tasks are performed during the PM. When they are found they should be described in a note on the PM work order or the PM report and it would be helpful if the description of the findings provided enough information to enable a judgment to be made as to the worst case potential level of severity (LOS 3, LOS 2, LOS 1 or LOS 0 - see Section 1.8 below) of the adverse outcome that would have resulted if the hidden failure had not been discovered.

A particularly important type of hidden failure is one that disables the proper operation of an automatic protection mechanism (APM) that is included as a component of the device. An APM is usually incorporated in the device to provide protection against another possible hidden failure that is itself considered to be capable of a serious, potentially life-threatening outcome.

1.7 Classifying and coding PM Findings


In the recommended format of each HTMC generic PM procedure (see column 7 of Table 4) a reporting section is added at the bottom of the procedure asking the service person to indicate by circling one of three letters (A, B or F) whether or not performing the so-called SV or safety verification tasks to evaluate the performance and safety of the device revealed any significant degradation (latent MR1 failures) or any hidden failures .

PM Code A = nominal. The letter A should be circled when the results of all of the performance and safety tests were in compliance with the relevant specifications, and any other functions tested were within expectations.
PM Code B = minor OOS condition(s) found. The letter B should be circled when one or more conditions were found that were slightly out-of-spec (OOS) or slightly outside expectations. The purpose of this B rating is to create a watch list to monitor for future adverse trends in particular performance or safety features, even though the discrepancy is not considered to be significant at this time. An example of this would be an electrical leakage reading of 310 microamps which is within 5% of the 300 microamp limit. A B rating should be considered a passing grade.
PM Code F = serious OOS condition(s) found. The letter F should be circled when one or more performance or safety features is found to be significantly out-of-spec. (OOS). This is a failing grade and, if it is a high-risk device, it should be removed from service immediately.

The service person is also asked to indicate by circling one of four numbers (1, 5, 9 or 0) the physical condition in which the device parts that were rejuvenated by the traditional PM tasks were found. The numerical ratings should be circled to indicate one of the following findings.

PM Code 1 = better than expected. There was very little or no deterioration; i.e. the physical condition of the restored part was found to be still good.
PM Code 5 = nominal. There was some minor deterioration but no apparent adverse effect on the device’s function; i.e. the physical condition of the restored part was found to be about as expected.
PM Code 9 = serious physical deterioration. The restored part was already worn out and probably having an adverse effect on device function; i.e. the physical condition was found to be considerably worse than expected.
0 = no physical restoration required. The device has no parts needing any kind of physical restoration.

If the PM findings are systematically documented each time a PM is performed, then aggregated into a PM Findings database, it will be possible to:

  • get a measure of the PM Yield (the ratio/percentage of problems found during PM to the number of PMs performed, and
  • get an indication of the mean time between failures (MTBFs) of any hidden failures, and
  • get an indication of how well the PM interval matches the optimum - which would be when the part being restored has deteriorated - but only to the point just before the point where the deterioration begins affecting the functioning of the device.
  • A preponderance of PM Code 1 findings would indicate that the interval is too short; and
  • A preponderance of PM Code 9 findings would indicate that the interval is too long.
  • A preponderance of PM Code 5 findings would indicate that the PM interval is just about right.

1.8 Possible adverse outcomes of medical device failures

There is a wide range of possible adverse outcomes from device failures. Some create potential physical harm to the patient (or to the device user). Others can result in additional direct or indirect costs to the facility and thus create an economic or business risk to the organization. We address these economic/business risks in greater detail in HTM ComDoc 9. "Medical devices that may benefit from PM from a business/ economics viewpoint"

In the case of outcomes creating the possibility of physical harm it is helpful if there is a need to conduct some kind of risk analysis or risk assessment to define a hierarchy of three levels of severity (LOS) of possible physical harm to the patient, or - in the case of economic harm to the facility - three levels of economic harm to the business.

Outcomes resulting in possible physical harm

  • LOS 3 = Serious, life-threatening injury - The patient (or the user) may lose his or her life.
  • LOS 2 = Less serious, non life-threatening injury - The patient (or the user) may sustain a direct or indirect injury ranging from minor to serious.
  • LOS 1 = No injury, but possible disruption of care - The incident may cause a temporary disruption of care, such as requiring one or more patients to be rescheduled, delaying treatment or delaying the acquisition of diagnostic information.
  • LOS 0 = No discernible injury or possible disruption of care.

Outcomes resulting in possible economic harm

  • Level 3 = Major economic impact - on the facility’s cost of doing business
  • Level 2 = Significant economic impact - on the facility’s cost of doing business
  • Level 1 = Relatively minor economic impact - on the facility’s cost of doing business
  • Level 0 = No discernible impact - on the facility’s cost of doing business

1.8.1 Life support devices

There are some devices, such as critical care ventilators and defibrillators, on which the the patient's continued well being may be totally dependent. These are sometimes called life support devices. Any type of failure that causes such a device to stop working completely or to stop working properly has the potential to result in an adverse outcome at the highest severity (LOS 3) level. If the device also happens to have one or more non-durable parts that needs timely and competent periodic restoration, this device then becomes critically vulnerable to a wear-out failure and it therefore becomes a device that should be given a high priority for PM. The same is true if the device has a hidden failure that could cause a high severity outcome.

1.9 Which kinds of medical equipment failures are PM-related failures?

Of the many ways in which devices can fail (its possible failure modes) listed in Section 1.4 above, there are two kinds that can be described as PM-related though only one kind (MR1 failures) that is PM-preventable:

1. Category MR1 (wear out) failures that have caused the device to stop working completely. These are failures that are caused by a non-durable part not receiving timely, competent restoration.
2. PM Code F (hidden) failures resulting from imperceptible failures of components within the device that do not cause the device to stop working completely but which have reduced the device's performance or safety below a critical level. These are failures that are discovered when safety verification (SV) tasks are performed during PMs and although this testing does not totally preclude the possibility that a patient will be exposed to the device while it is in a defective state, the discovery and correction of these hidden failures does shorten the period during which patients are exposed to this potentially hazardous condition. This benefit is addressed more completely in Sections 6.3 and 6.4 in HTM ComDoc 6.


1.10 The five basic questions at the heart of the great PM debate


The foregoing analysis puts us in a position to answer the first of the five basic questions about PM - some of which have been addressed previously in HTM ComDoc 15

1.10.1 Question 1. To what extent, does performing PM on medical equipment improve patient safety?


Generally speaking, PM does improve patient safety, but only to the extent that it detects then corrects the two kinds of PM-preventable failures that were identified just above in Section 1.9 (wear-out failures and hidden failures). And the extent of the improvement in patient safety varies for different devices according to the "level of risk" that the device would have presented if those potential failures had not been detected, and then eliminated. According to the modern theories of risk management, the level of risk takes into account both the level of the severity of the adverse outcome of the event and the likelihood that the event will actually occur.

In this case we are specifically concerned about the level of risk posed by PM-preventable failures, so the extent of the improvement in patient safety is determined by a combination of the potential severity of the outcome of the failure (with the higher levels of outcome severity - such as LOS 3 - being more serious than LOS 2, etc), and the likelihood of the failure occurring. The proper measure of this likelihood of the failure occurring is what the Task Force calls the device's PM-related reliability. We discuss this "likelihood of failing from a PM-preventable cause" more in HTM ComDoc 4 "Consideration of the device's PM-related reliability".

The Task Force has investigated both of these factors. Table 4 provides a ranking of the various device types according to the severity of each device's potential PM-preventable failures. For more on this investigation, see HTM ComDoc 3 "Risk assessment: Determining which medical devices can be made safer (but only a little safer) by PM". The device types at the top of the listing in Table 4 (rows 1 through 7) are judged to have potential PM-preventable failures with life-threatening outcomes. The PM-related reliability of each of the top twenty highest severity device types in Table 4 are currently being investigated and as the results become available they will posted to columns C8 and C9 of Table 13. For more on this investigation, see HTM ComDoc 4 "Consideration of the device's PM-related reliability".

The Task Force has set tentative thresholds for what should be considered an acceptable (safe) level of PM-related reliability for the devices in each of the three top levels of potential PM-related risk categories (namely those labeled high, moderate and low in column C10 of Table 13). From this table, once it is completed, professionals in charge of medical equipment maintenance programs will be able to identify which devices (by manufacturer and model) should continue to be maintained strictly according to their manufacturer's recommendations, and for the others, what level of PM-related reliability (which corresponds to PM-related safety when the category of severity is taken into account) is typically achieved when the indicated PM interval and procedure is used. The Task Force has also suggested a way in which the level of PM-related patient safety can be monitored on a continuous basis (see Section 3.10 of HTM ComDoc 3).

As can be seen from the summary below there are several other benefits from performing regular PM besides improving patient safety.

  • Improving patient safety. … Some devices - but only some - can be made safer (but only a little safer) by performing appropriate PM. Not all failures have the potential to cause a serious injury, and not all failures are PM-preventable.
  • Regulatory compliance. … As we explain more fully in HTM ComDoc 11 the CMS regulation addressing PM for medical devices has traditionally been that all medical devices must be maintained strictly according to the device manufacturers' recommendations. Even after the regulations were changed in 2013 there is still a requirement that certain devices be subjected to periodic PM. (For more on this see HTM ComDoc 16).
  • Better business economics. … As we explain more fully in HTM ComDoc 9. some devices - but only some - are made less costly to maintain by performing appropriate PM
  • Customer courtesy and/ or customer reassurance. … We may choose to perform PM on some devices because a user has asked us to do so, or because we believe that periodically inspecting and cleaning equipment used for patient care creates a reassuring "cared for" appearance that the user staff appreciates. While this is a qualitative rather than a quantitative benefit it should not be underestimated. These periodic inspections may also be useful by leading to the discovery of unreported broken equipment. The Task Force has issued a cautionary note about the possibility of undervaluing this last factor (see Section 16.11 of HTM ComDoc 16)

......................................................................................................................................................................................................................................................

1.10.2 Question 2. What kind of PM program is required by the current CMS regulation?


The original Medicare legislation in 1965 stated that: "... There must be a regular periodical maintenance and testing program for medical devices and equipment. A qualified individual such as a clinical or biomedical engineer, or other qualified maintenance person must monitor, test, calibrate and maintain the equipment periodically in accordance with the manufacturer's recommendations and Federal and State laws and regulations. ..." But beginning in 1989 and as recently as 2011 the corresponding standards of the Joint Commission allowed equipment that was not considered to present a significant physical risk to be excluded from any specific maintenance requirements stating only that PM frequencies should be based on "criteria such as manufacturer's recommendations, risk levels, or current hospital experience," and they, in effect, endorsed the original Fennigkoh-Smith risk-based methodology.

This changed in 2011 when CMS issued revised regulations that narrowed the still official CMS requirement to use the manufacturer's maintenance recommendations from all equipment to just " ::3. Fail-safe design. Again, for devices with this level of risk, it would be prudent to choose (if it is available) a version of the device that has some kind of built-in fail-safe design, such as component redundancy. All equipment critical to patient health and safety</font> and any new equipment until a sufficient amount of maintenance history has been acquired." The "risk-based" option that TJC had been allowing was effectively rescinded. The revised CMS requirement specifically stated that for what they were now calling equipment critical to patient health and safety " Alternative equipment maintenance (AEM) methods are not permitted." However, there was no clear indication of which particular devices they intended to target with this definition of "critical." They seemed to be placing the responsibility for this onto the facility by stating that the "... hospital may adjust its maintenance, inspection, and testing frequency and activities for facility and medical equipment from what is recommended by the manufacturer, based on a risk‐based assessment by qualified personnel".

Faced with some push-back from members of the HTM community CMS issued a "clarification" memo in 2013 (HTM ComRef 28) in which they tried to address the uncertainty about the precise meaning of the phrase "equipment critical to patient health and safety". The key language in the 2013 memo is quoted in Section 11.3 of HTM ComDoc 11 Suffice it to say that this new language does not clarify sufficiently what the agency intends by the term "critical" and the Task Force's interpretation of their intention is described in Section 11.4 of HTM ComDoc 11 The new regulatory language does however introduce a major concession by allowing devices that are not considered to be "critical" to be included in an Alternative Equipment Management (AEM) program where they can be maintained other than as the manufacturer recommends. As reported also in HTM ComDoc 11, the Task Force summarizes its conclusions about the agency's intention in the form of the following two recommended AEM program inclusion criteria.

Recommended AEM Program Inclusion Criteria

After a careful analysis of the CMS memo the Task Force believes that, except for four specific categories of devices, the agency intends to allow to be included in an AEM program only those devices that meet one, or both, of the following criteria:

  • The device is highly unlikely to cause a serious injury or death to a patient or staff person if it should fail in a way that could have been prevented by the device having been subjected to appropriate PM
  • The device is highly unlikely to fail from a PM-preventable cause

Identification of the four specific categories of devices that cannot currently be included can be found by consulting HTM ComRef 33.

The Task Force's suggestions for implementing an efficient risk-based AEM program that will be compliant with these two criteria are contained in a recently-published two-part article in AAMI"s BI&T journal (HTM ComRef 35 and HTM ComRef 36). Much of that material is also contained in HTM ComDoc 16 "Implementing a simple CMS-compliant Alternate Equipment Management (AEM) program."

......................................................................................................................................................................................................................................................

1.10.3 Question 3. How to maximize the efficiency of a planned maintenance (PM) program


HTM ComDoc 10 "Alternate Maintenance Strategies and Maintenance Program Optimization" identifies the following four maintenance strategies that are relevant to maintaining medical devices.

  1. Traditional fixed interval preventive maintenance (often combined with #3, periodic safety verification)
  2. Predictive maintenance
  3. Periodic safety verification
  4. Light maintenance (also known as run-to-failure maintenance)

The least efficient maintenance strategy in terms of using up scarce technical manpower is (#1) the traditional fixed interval preventive maintenance strategy. Predictive maintenance (#2) is the next least efficient. It differs from strategy #1 primarily in effectively extending the interval between restorations or replacement of the device's non-durable parts by substituting a visual inspection for the original restoration task. The most efficient strategy is, of course, the light maintenance strategy (#4). The periodic safety verification strategy is neutral with respect to efficiency because it must be performed on all devices that have a potential high severity (LOS 3) outcome to a hidden failure. It may also be considered prudent to perform periodic safety verification on all devices that are projected to have a less severe potential (LOS 2) outcome to a hidden failure.

Starting with the least efficient situation - a program in which PM is currently being performed on all of the facility's equipment according to the manufacturer's recommendations - implement the following steps:

  • Step 1 Identify which devices can be classified as non-critical devices (see Section 3.8.1 in HTM ComDoc 3), and change these immediately to a run-to-failure maintenance method (i.e. perform no scheduled PM).
  • Step 3 Look over the recommendations below that are taken from Section 4.10 of HTM ComDoc 4 and HTM ComRef 36. Then make the changes that you feel comfortable with (see also .... and HTM ComRef 35).

Recommendations for improving the efficiency of a medical equipment maintenance program


These are potentially hazardous devices with either overt or hidden PM-preventable failures that could cause a life-threatening injury and that are demonstrating PM-related failure rates greater than the currently acceptable level (not more than one failure every 75 years). For these devices, it would be prudent to continue to follow the manufacturer-recommended PM procedure (for both the interval and the scope of the tasks) and to routinely monitor the levels of patient safety being achieved, as described in Section 3.10 of HTM ComDoc 3 and HTM ComRef 35. This should be continued until acceptable evidence exists in the national database (Table 13) that some other procedure with more efficient tasks and/or a longer interval is found to demonstrate the same or better level of PM-related reliability or a comparable level of patient safety.

These are potentially hazardous devices with hidden PM-preventable failures capable of causing a life-threatening injury that are demonstrating PM-related failure rates greater than the currently acceptable level (not more than one failure every 75 years). For these devices, for which the only “maintenance” that the manufacturer recommends is periodic safety verification, it would be prudent to continue to follow the manufacturer-recommended safety verification testing schedule and routinely monitor the levels of patient safety being achieved, as described in Section 3.10 of HTM ComDoc 3 and HTM ComRef 35, until evidence exists in the national database (Table 13) that testing at a longer interval results in the same or better level of PM-related reliability or a comparable level of patient safety.

When testing for possible hidden failures with potential high-severity outcomes, there is no optimum interval — shorter is always better. However, it has been shown (see Section 6.3 in HTM ComDoc 6.) that for safety verification–related (hidden) failures with MTBF values greater than about 50 years, the increase in the time that the patient would be exposed to potentially hazardous hidden failures if the testing interval was increased from six months to as long as five years is very small.

These lower PM-risk devices qualify for inclusion in an AEM program either because of the lower level of severity of the outcomes of potential failures or because they have demonstrated an acceptable level of PM-related reliability. Therefore, they can be maintained using a maintenance procedure or strategy other than that recommended by the manufacturer. They can be transitioned immediately to less stringent PM strategies, such as the cost-efficient light maintenance (run-to-failure) strategy - which is mentioned in Appendix A of the CMS memo (HTM ComRef 28). At the very least, the manufacturer-recommended procedures can be modified (such as by omitting electrical safety checks that the facility has found to be nonproductive), or by extending the testing interval to make it coincide with a more convenient or more efficient routine.

The logical rule here is to explore the national database (Table 13) for evidence of more efficient maintenance procedures. It would be prudent to monitor the levels of patient safety (as described in Section 3.10 of HTM ComDoc 3 and HTM ComRef 35) being achieved by the current procedure (or any of the more efficient procedures, if chosen) for devices categorized as PM priority 2 (moderate PM-risk) devices. Monitoring those in the lower risk categories is much less important but can be undertaken if the facility chooses.

If these devices should fail, there is a negligible or zero additional risk to patient safety. Therefore, in the absence of other regulatory mandates, unless there is a convincing case that periodic PM can be justified through lower maintenance costs, these devices are excellent candidates for the very efficient light maintenance (run-to-failure) strategy. It was by adopting this run-to-failure maintenance strategy in the early 1960s that the civil aviation industry was able to reduce its maintenance costs by 50% while, unexpectedly, also improving the reliability and safety statistics for civilian aircraft by a factor of 200 times.

......................................................................................................................................................................................................................................................

1.10.4 Question 4. How to maximize equipment-related reliability and safety (by using an Enhanced Risk Management Program)


The opening paragraph from HTM ComDoc 8 "Maximizing medical equipment-related reliability and safety" reads as follows:

"To the best of our knowledge, all of the studies reported to date have shown that only a very small percentage of injuries resulting from failures of medical devices are attributable to poor maintenance. See,for example, reference HTM ComRef 12). And, as we describe in Section 1.4 of HTM ComDoc 1, ...the great majority of medical device failures can be attributed to one or other of a fairly wide range of other causes.... However, if the cause of each device failure is routinely documented in the manner suggested in that same section of HTM ComDoc 1, this information (on which of those causes is currently contributing the most to device failures in a particular facility) can be very helpful in managing device failure prevention activities other than PM, and in monitoring the effectiveness of those efforts.

So, if our overall goal is to reduce the number of medical device failures, it makes sense to investigate ways in which these other causes can be reduced or eliminated. In HTM ComDoc 8 we point out that, based on the general statistics on causes of device failures, the most effective strategy for reducing failures of the critical life support device types is to:

  1. Give preference during device acquisition to those devices that are reported to have the highest level of inherent reliability. The possible impact of this strategy is unknown at this time but current statistics indicate that the inherent unreliability of the devices themselves accounts for 45-55% of all failures.
  2. Implement additional measures to reduce failures from the list of causes presented immediately below. They are listed in descending order of anticipated effectiveness.
13-20% - User-related issues such as controls or switches that have been set incorrectly. Although this type of failure may not always lead to a complete loss of function, it can have the same effect as actual failure. For example, an incorrectly set defibrillator can jeopardize patient resuscitation. (These Category PR1 calls typically represent between 13-20% of all of the repair calls).
7-8% - Problems related to a poor rechargeable battery management program. (These Category PR3 calls typically represent between 7-8% of all of the repair calls)
6-25% - Physical damage usually caused by a combination of poor design and user carelessness, such as dropping the device. (These Category PR2 calls typically represent between 6-25% of all of the repair calls).
3-9% - Problems with an accessory, such as patient cables and electrodes. (These Category PR4 calls typically represent between 3-9% of all of the repair calls).
1-7% - Problems resulting from an out-of-specification environmental condition, such as poor control of the ambient temperature. (These Category PR5 calls typically represent between 1-7% of all of the repair calls).
1-4% - Lack of timely PM (i.e. failing to restore [replace or refurbish] a part of the device that requires periodic attention. (These Category MR1 calls typically represent between 1-4% of all of the repair calls).
1-3% - Poor installation or poor initial set-up of the device. (These Category MR2 calls typically represent between 1-3% of all of the repair calls).
<1% - Tampering with internal switches or other controls that are not intended to be user-accessible. (These Category PR6 calls typically represent <1% of all of the repair calls).
<1% - Problems due to an issue with a data transmission network connected to the device’s output. (Category PR7 calls)


We also note in HTM ComDoc 8 that the best way to reduce potentially critical hidden failures in those device types that are most vulnerable to those kinds of failures (i.e. the device types listed in the first 11 rows of Table 2) is to:

  1. First, select versions of the device that have built-in self testing to verify that the device is functioning safely,
  2. Second, be diligent about following the manufacturer's recommendations for periodic safety verification testing, and
  3. Third, consider implementing pre-use inspections or testing to verify that the device is functioning safely immediately prior to use .


*Enhanced Risk Management Program. A very beneficial use for some or all of the resources made available by improving the efficiency of the facility's maintenance program would be to implement an enhanced Risk Management Program incorporating some or all of the additional measures described above.


......................................................................................................................................................................................................................................................

1.10.5 Question 5. What changes to current PM work practices would be most beneficial?


As we state in Section 15.3 of HTM ComDoc 15. - there is absolutely no question that the most beneficial change would be for us to standardize the way we perform and report our maintenance activities .

There are three extremely important benefits that can be realized if the managers of the HTM community's maintenance programs can be persuaded to standardize on a common format for their maintenance reporting.

  1. Maintenance findings could be aggregated into a single, community-wide database which would then produce very helpful safety statistics on at least the more popular medical devices very, very quickly
  2. A comprehensive coding system for documenting the way devices fail would provide the data we need to optimize the effectiveness of the facility's maintenance and non-maintenance equipment safety strategies.
  3. By analyzing the findings of the PMs we perform on critical equipment we could select the right intervals to use for critical PMs.

1.10.5.1 Helpful safety statistics from the community-wide maintenance database

As was noted in Section 4.5 of HTM ComDoc 4, collecting the amount of data needed for an evidence-based approach to an equipment maintenance strategy will be problematical for most individual healthcare facilities. In many cases they will be unable to collect sufficient data in a reasonable period of time to make their failure rate statistics credible. However, data collected in a consistent, common format can be aggregated into a single database.

This statistical complication arises from three factors.

  • First, because they are designed and constructed differently, different manufacturer-model versions of a given device type, such as defibrillators, can be expected to show different levels of reliability. So each different manufacturer-model combination has to be analyzed and characterized separately.
  • Second, most individual healthcare facilities will probably have only a small number of the individual device types that are PM-critical at the highest severity level. (See rows 1-20 of Table 4).
  • The third factor has to do with the likelihood that devices that are potentially PM-critical are likely to be carefully designed and fabricated to have a relatively low failure rate.

The result of these complicating factors is that individual facilities will probably not generate enough data to get a good indication of each device’s true PM-related reliability and PM-related level of safety. To get accurate estimates of the reliability of high reliability devices it will be necessary to pool maintenance statistics for each manufacturer-model version of each device from a number of institutions.

For example, suppose a facility has only three similar (same manufacturer – same model) heart-lung units and only three years of maintenance history for each unit. Since the facility has a total of only 9 device-years of experience, it is unlikely – if the actual MTBF of the units is, say, 50 years or more – that the facility will have experienced even one single failure during the 3-year testing period. In this case they would have to report their finding with respect to the devices’ indicated reliability (zero failures over 9 device-years) as undetermined. If, however, they did experience one or more failures of one of these devices during this relatively short period, then the indicated MTBF will appear to be unacceptably short for a critical device. In this situation it would be prudent for the facility to consult the findings on the reliability of these specific types of device in the national database to see whether or not their particular experience was indeed typical (and this type of device is, in fact, not sufficiently reliable) or if their experience was atypical. The Task Force has set a tentative level of 50 device-years as the minimum level for reasonable credibility (see Section 4.5 of HTM ComDoc 4)

Also, as was noted in Section 4.9 of HTM ComDoc 4, the summary proof tables (Table 5) are the most valuable part of the community database. In Section 1.9.4, above, we described how the statistics in Table 5 can be used to identify the most common causes of equipment failures.

1.10.5.2 The hard evidence showing which PM intervals are optimum for different kinds of PM-critical equipment and what levels of PM-related reliability are achieved at those intervals


As we describe in Section 4.7 of HTM ComDoc 4, adopting a coding system for PM findings similar to that described in that section and systematically documenting these findings each time a PM is performed, then aggregating that data, will make it possible to obtain two very important pieces of information:

1) An indication of how well the PM interval matches the optimum. The optimum PM interval is when the parts being restored have deteriorated but not to the point where the deterioration has started to affect the functioning of the device. The indicators for how close the interval is to this optimum are as follows. A preponderance of:
  • PM code 1 findings (still very good) is an indicator that the interval is too short.
  • PM code 5 findings (about as expected) is an indicator that the interval is about right.
  • PM code 9 findings (already worn out) is an indicator that the interval is too long.
2) A numerical MTBF indicating the device’s level of PM-related reliability. This indicator is the lesser of the following MTBF values (representing the lower level of PM-related reliability):
  • The MTBF based on the total of any overt failures caused by inadequate device restoration (MR1 calls from the repair cause coding) and any PM code 9 findings (which are immediate precursors of the overt failures caused by inadequate restoration).
  • The MTBF based on the total of any hidden performance and safety degradations detected by the safety verification tasks (PM code F findings).


Site Toolbox:

Personal tools
This page was last modified 21:26, 8 January 2019. - This page has been accessed 37,398 times. - Disclaimers - About HTMcommunityDB.org