Log in Article Discussion Edit History Go to the site toolbox

Draft BI&T article 4-8-16

From HTMcommunityDB.org


Reliability-Centered Maintenance: A Tool for Optimizing Medical Device Maintenance (4-8-16 draft)

Malcolm Ridgway PhD, CCE; now retired; Woodland Hills, CA; malcolmr36@gmail.com
Matthew Clark, MBA, CBET; Clinical Engineer, Advocate Health, Downers Grove, IL; matthew.clark@advocatehealth.com
Cheryl Bettinardi, BMET; Advocate Health, Downers Grove, IL; cheryl.bettinardi@advocatehealth.com

At last, after many years of inconclusive discussions about the role of planned maintenance (PM) in keeping medical equipment safe, in October 2015 AAMI announced its support for a new project …”to begin exploring whether an approach known as reliability-centered maintenance (RCM) should be adopted on a wider scale throughout the field of healthcare technology management (HTM).” RCM was pioneered in the 1950s by the civil aviation industry as a way to reducing the cost of maintaining their aircraft. It quickly succeeded and played a key role in making it economically feasible to bring the first jumbo jets into successful commercial service. And while the original motivation for this new approach was business economics it was soon discovered that these new maintenance strategies had a significant and unexpectedly positive effect on the reliability of their aircraft (reference 1). Consequently, during the latter part of the last century, virtually all of the nation’s high reliability industries, including the entire military, the entire aerospace industry, the entire nuclear submarine industry and the entire nuclear power industry – among others – switched over to these new, very effective and highly efficient practices. And, although healthcare technology is now regarded as a part of the high reliability sector, the embryonic HTM community at that time did not adopt any of these changes. It is extremely disappointing to report that we will almost certainly be the last of the high-reliability technology sectors to adopt RCM’s modern scientific methods. It is way past time for us to move on and abandon the outdated maintenance practices of the last century! Getting started

An excellent place to begin this exploration of RCM is by drawing on its basic principles to investigate our current rationale for performing PM. There has been a very long period of contentious debate about the shortcomings in efficiency and effectiveness of the traditional methods still being used to maintain medical devices. After many years in the business of providing medical equipment maintenance services to hospitals, one of the authors (MR) is convinced that at least half of all the PM work currently being performed in the name of regulatory compliance provides little or no value and does virtually nothing to improve the overall safety of the equipment. At this time of an ever-increasing desire to apply modern technology to healthcare, combined with a very limited pool of technical resources, there are endless opportunities to redeploy the technical manpower that is currently being wasted performing valueless PM on the nation’s medical equipment. There are many health technology management areas experiencing technical challenges where these scarce resources could be put to much better use.

A volunteer Healthcare Technology Management Community (HTMC) Maintenance Practices Task Force has been formed and the project is already underway. It has three broad objectives.

1. Disseminate as widely as possible a concise body of information describing very clearly why routine PM fails to improve the safety of a significant majority of medical devices.

2. Develop guidelines and tools to facilitate a rational optimization of our PM programs

3. Create a community-wide database to provide a substantial body of quantitative evidence to support this rational optimization There are only two ways in which periodic PM can prevent medical equipment failures. This becomes immediately obvious when we examine the PM procedure of a typical medical device. Figure 1 is a good example; it is the generic PM procedure for a critical care ventilator. And like all of the PM procedures in use throughout the industry, it contains only two kinds of tasks.

  • Device restoration (DR) tasks … that restore the physical condition of the device to something close to its original, like-new condition by reconditioning or replacing parts, such as batteries, cables, fasteners, gaskets and tubing. These parts are not usually intended to last for the entire working lifetime of the device. DR tasks improve the device’s reliability (but usually only to a minor degree) by preventing failures that would otherwise result from the deterioration of these so-called non-durable components. Sometimes the parts are reconditioned; sometimes they are replaced. But this improved reliability increases the device’s level of safety only if a complete failure of the device is likely to result in an adverse outcome that could harm the patient or a member of the staff treating the patient. In the generic procedure DR tasks that could result in an adverse outcome if they are found to be worn out are labeled according to the potential level of severity of the outcome.
  • Safety verification (SV) tasks … consist of visual inspections or tests to confirm that the device is still performing within its original functional and safety specifications. In the generic procedure SV tasks that could result in an adverse outcome if they fail the performance / safety verification are labeled according to the potential level of severity of the outcome.

Figure 1 PM Procedure C.VEN-01 Critical Care Ventilator

See attached file/ Figure 1 C VEN 01 4-8-16.docx

In a case where the generally accepted PM procedure for the device contains no DR tasks (because the device has no parts needing periodic restoration) and the procedure has no SV tasks with potentially significant adverse outcomes because the device has no potential to deteriorate in a way that could conceivably cause some kind of patient injury, then there is no way that performing this procedure can make the device any safer.

And in cases where the PM procedure does have one or more DR tasks (that would prevent the device from failing) - but the DR tasks are not considered to a potential level of severity that could cause a patient injury, then this periodic PM will improve the device’s reliability (usually by only a small amount), but it will not make the device any safer.

Devices that fit into either of these two categories are considered to be non PM-critical or, more simply, non-critical devices. There is no way they can be made any safer by any kind of periodic PM. Figure 2 provides another good example. It is the generic PM procedure for a patient scale, which is a non-critical device.

Figure 2 PM Procedure PA.SC-01 Patient Scale

See attached file/ Figure 2 PA SC 01 4-8-16.docx

Contrary to general belief within the healthcare industry, and in particular by the authors of the current regulations governing equipment maintenance in the nation’s hospitals promulgated by the Centers for Medicare & Medicaid Services (CMS), the great majority of medical devices in use in today’s modern healthcare facilities are non-critical devices that cannot be made any safer by periodic PM.

Direct evidence from the field also seems to confirm that PM plays a very minor role in both preventing equipment failures and in improving equipment safety. In spite of its traditional and potentially misleading name – preventive maintenance – planned maintenance (PM) currently has a very minor impact on medical equipment failures. A study reported in 2009 (reference 2) showed that maintenance issues are the root causes of less than 4% of all medical device failures, and that less than 3% of all equipment system failures are PM-preventable. The overwhelming balance of equipment failures (96-97%) are attributable in about equal parts to inherent device problems, such as random failures or malfunctions of a component part of the device, and to process-related failures such as incorrect set-up or operation of the device by the user. If PM-related device failures are so rare then the possibility of patient injuries resulting from PM-related failures must be even more rare. Statistics from at least one nationwide database (reference 3) appear to confirm that this is the case.

The simple questionnaire (shown in Sidebar 1) provides a quick and easy way of identifying which of the hospital’s devices are non-critical.

Sidebar 1 Questionnaire for determining which devices are non-critical.

Q1. Is it reasonably possible that there could be some kind of adverse patient outcome if this device, without reasonable warning, stops working while being used on a patient? Response (1): yes or no.

Q2. Is it reasonably possible that the device will stop working if one (or more) of the device restoration tasks included in the manufacturer’s PM procedure (or the corresponding HTMC PM procedure*) is not completed in a timely manner? Response (2): yes or no.

Q3. If the responses to Q1 and Q2 are both “yes”, briefly describe the nature of the worst-case possible adverse outcome. Response (3): ____________(See examples in column 6 of the website’s Table 2) ________________

Q4. Identify any possible mitigating factors that might reduce the severity of the expected outcome. Response (4): ____________(See examples in column 8 of the website’s Table 2) ________________

Q5. After considering the possible mitigating factors listed in response to Q4 above, project the worst-case Level of Severity (LOS) of the outcome of the failure; where LOS 3 represents a potentially life-threatening situation, LOS 2 represents the possibility of a non life-threatening patient injury, and LOS 1 represents a possible disruption of patient care - such as; a significant delay in obtaining diagnostic information, a significant delay in treating the patient, or increasing the patient’s length of stay in some other way.

Response (5): LOS 1, LOS 2 or LOS 3. _________

Q6. If the manufacturer’s recommended PM procedure (or the corresponding HTMC PM procedure*) includes any functional performance or safety tests, is it reasonably possible that there could be some kind of adverse patient outcome if the device falls out of spec and fails one, or more of those tests? Response (6): yes or no.

Q7. If the response to Q6 above is “yes”, describe briefly the nature of the worst-case outcome. Response (7): ____________(See examples in column 5 of the website’s Table 3) ________________

Q8. Identify any possible mitigating factors that might reduce the severity of the expected outcome. Response (8): ____________(See examples in column 7 of the website’s Table 3) ________________

Q9. On the same scale of 1 to 3 described in Q5 above, project the worst-case Level of Severity (LOS) of the anticipated adverse outcome. Response (9): LOS 1, LOS 2 or LOS 3. __________

Non-critical device.

If the analysis results in Response (6) being “no”, and one or both of Responses (1) and (2) is/are also “no”, then this type of device should be classified as not potentially PM-critical or, more simply, as a non-critical device.

Device is potentially PM Priority 1.

If the analysis results in any other combination of responses then this type of device should be classified as potentially PM Priority 1 at a Level of Severity (LOS) representing the combined LOS levels of the DR-related and SV-related failure modes. (see Table 1).

  • The HTMC’s Maintenance Practices Task Force is currently creating a set of standardized generic PM procedures for each separate type of medical device. These procedures are designed to be functionally equivalent to each of the manufacturer-recommended PM procedures and, for the purpose of this analysis, they can be used instead of the manufacturer’s recommended procedure.

The proposed format for these standardized generic procedures is shown in Figures 1&2

To view some of the Task Force’s other model generic PM procedures, /visit the website at: www.HTMCommunitydB.org (see Sidebar 2) select “Page 2”, then “Table 4”. / Then click on one of the active links in column 6.

The key parts of the questionnaire are questions 1, 2, and 6. The other questions help categorize the worst-case severity of the patient harm that could result (in the case of a potentially PM Priority 1 device) if the PM is not performed, and whether or not there are any mitigating factors that could reduce or change this.

Table 1. PM failure severity levels for all potentially PM Priority 1 devices

See attached file/ Table 1 4-8-16.docx

Sidebar 2 The Task Force’s website is at www.HTMCommunitydb.org

The site is in the process of being updated to eliminate the need to log in. (It is a wiki-type site) Until such time as it is updated, you can get access as follows:

Click on “here” on the opening page/

View the “Main Page” where you will see these instructions in the section labeled “Access” /

Scroll down to one of the active links, which should appear in blue or orange, say the link “Introductory Materials”, and click on it/

You will then get the “Log In Required” dialog box /

Log in by using “view” as your user name and “view” as your password/

You should then be able to click successfully on any of the active links on the site /

Now click again on whatever Page you would like to view. You will also be able to download and print any of the materials.

Preliminary findings

There are many relatively simple devices, such as a patient scale (Figure 2), that have no critical non-durable parts needing periodic attention and no critical safety verification tasks needing to be performed. As best we can estimate there are, in round numbers, between 750 and 1500 different types of healthcare-related devices in use in today’s healthcare facilities. An unknown number of these are non-clinical devices such as printers or other accessories that do not even fall into the formal category of a medical device that is regulated by the FDA. The Task Force believes that these non-clinical devices are very likely to also be non-critical devices.

At the other end of the scale, initial work performed by the Task Force, which is described in the documents on their website (see Sidebar 2), has so far identified 71 device types as being potentially PM Priority 1, at several different levels of potential outcome severity. More details on this tentative categorization can be found in the website’s Table 4 and supporting Tables 2 and 3. (For convenience, the website’s Table 4 is reproduced here as Table 1.) The Task Force believes that a large percentage of the remaining balance - representing at least 700 and maybe as many as 1400 different device types - will prove to be non-critical when they are analyzed using the questionnaire.

Immediate workload relief

This first step, using the questionnaire to separate out from the inventory those devices that are non-critical, provides a rational justification – regulatory constraints permitting - for a considerable amount of immediate workload relief because all of the device types that are found to be non-critical are legitimate candidates for the so-called light maintenance strategy which simply allows the device to be used – regulatory constraints permitting - without any kind of periodic maintenance whatsoever (reference 1 ). In some cases an argument might be made for periodic PM interventions on the grounds that they would reduce the net cost of maintaining the device, but - as of this time - we know of no studies that have documented such a finding for any type of medical device

Risk-based prioritization of those devices found to be potentially PM Priority 1

Now that we have described a way of identifying those non-critical medical devices that cannot be made any safer by subjecting them to any kind of periodic PM, the next step in our investigation is to perform a risk assessment to determine which of the potentially PM Priority 1 devices should be given the highest priority (PM Priority 1) because they are the most likely to become hazardous if they are not given timely attention. As we shall see, different device types that are categorized as potentially PM Priority 1 devices can be expected to present different levels of risk. Those with the potential to present the greatest risk of injuring a patient should be given a correspondingly higher priority for attention and timely PM.

As we have just noted, those devices that do have significant PM-related failure modes (what we have been calling potentially PM Priority 1 devices) can be expected to present different levels of risk, and the risk assessment described below will provide us with a way of separating them into a range of different risk levels. Since The Joint Commission uses the term “high-risk device” in its standards with criteria that do not necessarily coincide with the criteria we are using, and the CMS regulations use the term “critical” in a similar way, we have chosen to use the alternative term “PM Priority 1 device” to label devices that present the highest level of potential risk. Similarly we have adopted the terms PM Priority 2, 3, 4 and 5 to label those devices that are found to have progressively lower levels of risk (see Table 2). These can be considered to represent moderate, low and very low levels of risk, respectively.

Table 2.Tentative definitions of what should be considered the minimum acceptable levels of PM-related reliability/ safety acceptable levels of PM-related reliability

See attached file/ Table 2 4-8-16.docx

A new RCM-based risk assessment

According to modern reliability and risk management theory, risk has two components:

  • The severity of the outcome of the event (in this context a PM-preventable device failure);
  • The likelihood that the event (the PM-preventable device failure) will actually occur.

Requiring this combination of two factors means that devices for which there happens to be a manufacturer-recommended PM procedure will not necessarily become hazardous if the manufacturer’s recommendations for periodic PM are not followed exactly to the letter. If the likelihood of a PM-related failure actually occurring (even if the failure has a potentially high-severity outcome) is found to be very low – with a probability equivalent to a mean time between failures (MTBF) of, say, 50-75 years or more - then the corresponding risk of harming the patient is also relatively low. (See Sidebar 3 for more on using the MTBF as a measure of reliability and safety). This is why travelling on a commercial airliner is considered to be safe. While there is a theoretical possibility of a high-severity outcome if the plane should crash, the likelihood that this will actually happen is very low - meaning that the risk of flying on a commercial airliner is correspondingly very low.

Sidebar 3 Failure rates expressed as mean time between failures.

It is generally more convenient to express device failure rates in the form of their equivalent mean time between failures (MTBF). The MTBF is simply the inverse of the failure rate. For example; a device that failed twice in nine years has an MTBF of 4.5 years. It can also be expressed as the number of device-years of device experience divided by the total number of device failures occurring during the observation period. The greater the number of devices in the sample and the longer the period of the observations, the closer the observed failure rate will be to the device's true failure rate (See references 4).

One of the authors (MR) remembers another illustration of this two-factor requirement from the mid 1980s that is a little closer to home. At that particular time there were many practical, hands-on people who seemed to intuitively recognize that a failure with a very low probability of actually occurring did not represent a very serious risk, even if the outcome of that failure could be a high severity event. Following the great electrical safety scare of 1968, which was widely publicized by Ralph Nader, the Joint Commission had urged that all line-powered devices be checked at fairly frequent intervals for excessive leakage current. However, after a little while many in-house programs discontinued the practice, usually rationalizing their action by simply saying “we just never find any high levels of leakage current”.

A conventional RCM risk assessment requires the identification of all of the possible ways a device could fail – what, in RCM jargon, are called the device’s failure modes. There are failure modes associated with inherent failures (such as random failures in the device’s electronics), failure modes associated with process-related failures (such as the operator setting one of the controls incorrectly), and failure modes associated with maintenance-related failures (such as the device being out of calibration). However, for our purpose here we can ignore the process-related and inherent failure modes and take a legitimate short cut by simply conducting a maintenance-focused risk assessment (reference 1).

Step 1. Projecting the worst case Level of Severity (LOS) of the outcomes of the failures. As noted in the questionnaire described above, there are two kinds of PM-related failures, device restoration-related failures and safety verification-related failures. Consulting the aggregated findings in response to questions 5 and 9 will allow us to project the worse case LOSs for the potential outcomes, thus fulfilling this first step in our maintenance-focused risk assessment.

The second step of the risk assessment is addressed below. And once actual risk levels have been projected, the final step will be determining to what degree the actual PM-related failure rates of the potentially PM Priority 1 devices are higher than the tentative levels that the Task Force has set for acceptable safety (See Table 2).

Step 2. Estimating how likely it is that each of the theoretical failures will actually occur. Rather than attempt to make educated guesses at typical failure rates, the Task Force has decided, for credibility reasons, to initiate a community-wide effort to gather real-world data. In an ideal world the data needed to determine each device’s PM-related failure rates would be obtainable from the equipment maintenance records that are already required at every accredited healthcare facility. In an ideal world there would also be some consistency in the format and content of those records. Unfortunately this is not the case and, if the members of the HTM community are willing to respond to this call to action and collaborate with the Task Force in addressing this very important issue, we will have to appeal for some voluntary standardization (See Sidebar 4).

Sidebar 4. Guidelines for standardizing the maintenance, testing and reporting

1. The maintenance entity must use the manufacturer’s recommended PM procedure, or one that includes, as a minimum, all of the device restoration and safety verification tasks listed in the relevant HTMC PM procedure (which is functionally equivalent to the manufacturer’s recommended procedure) for each manufacturer-model version of the various potentially PM-critical device types. (See Figures 1 and 2 ). This is to ensure that all of the device restoration tasks and safety verification tasks identified by the device manufacturer in their recommended PM procedures are addressed by each maintenance entity.

2. Although currently there are some regulatory constraints on this, for our purpose here it is not necessary for the maintenance entity to perform the PM restoration and verification tasks at the same interval as that recommended by the manufacturer. Indeed - in the absence of any regulatory mandates - some diversity would be welcome since one of the goals of the project is to compare the levels of device reliability and safety that are achieved at different maintenance intervals.

3. The maintenance entity must use some form of coding for repair calls that allows a separate count of the failures that are attributable to inadequate PM (similar to the Category 7 coding described in reference 2 ). 4. The maintenance entity must also use some form of coding for the PM findings, similar to that described in Sidebar 5. This allows a separate count of the number of times that a hidden failure was detected, as well as the number of times that a non-durable part was found to have deteriorated too far (see Sidebar 6).

Sidebar 5 Documenting important PM findings

One of the most helpful features of the standardized generic PM procedures (see Figures 1 and 2) is a section devoted to documenting key PM findings. At the bottom of the procedure, in a section titled “Findings”, the service person is asked to indicate by circling one of three letters, A, B or F, whether or not the results of the performance and safety testing of the device called for in the procedure, were:

A = Passed i.e. all of the device testing to detect hidden failures found the device to be in complete compliance with the relevant specifications; and any other functions tested were within reasonable expectations; or

B = Minor OOS condition(s) found i.e. one or more of the tests revealed a slightly out-of-spec (OOS) condition. The purpose of this B rating is to create a watch list to monitor for future adverse trends in particular performance or safety features, even though the discrepancy is not considered to be significant at this time. A performance rating of B is considered to be a passing grade.

F = Failed i.e. one or more of the tests found one or more of the device’s performance or safety features to be significantly out-of-spec. (OOS). This is a failing grade and, if this is a PM Priority 1 device, it should be removed from service immediately.

The same section also asks the service person to indicate, by circling one of four numbers, 1, 5, 9 or 0, whether or not the physical condition of the parts of the device that were rejuvenated by the device restoration tasks called for in the procedure, were:

1 = Still good/better than expected with very little or no deterioration; i.e. the physical condition of the restored part(s) was found to be still very good; or

5 = About as expected i.e. the physical condition of the restored part(s) was (were) found to be about as expected. If there was some minor deterioration it was probably having no adverse effect on the device’s function; or

9 = Already worn out/serious physical deterioration i.e. the physical condition was found to be considerably worse than expected and the restored part(s) was (were) already worn out and probably having an adverse effect on device function.

0 = No physical restoration required. The device has no parts needing any kind of physical restoration.

Systematically documenting these findings each time a PM is performed, and then aggregating the data will make it possible to obtain the following:

1) An indication of how well the PM interval matches the optimum. The optimum PM interval is when the parts being restored have slightly deteriorated - but only to the point where the deterioration is just beginning to affect the functioning of the device. The indicators for how close the interval is to this optimum are as follows. A preponderance of:

  • PM Code 1 Findings (still very good) is an indicator that the interval is too short
  • PM Code 5 Findings (about as expected) is an indicator that the interval is about right
  • PM Code 9 Findings (already worn out) is an indicator that the interval is too long

2) A numerical MPTF (mean time between failures) that is an indicator of each device’s level of PM-related reliability and safety. This indicator is the lesser (the one representing the lower level of reliability) of the two MPTFs specified below.

  • The MTBF based on the total of (a) any overt failures caused by inadequate device restoration (from the repair cause coding) and (b) any PM Code 9 Findings (which are immediate precursors of the overt failures caused by inadequate restoration), and
  • The MTBF based on the total of any hidden performance and safety degradations detected by the safety verification tasks (PM Code F Findings).

For more on this, including a discussion of the benefits of the PM procedure’s other important features, see the website’s HTM ComDoc 5.

Sidebar 6. A preferred system for coding repair calls

There are many reasons why equipment systems fail and it is important to recognize that only a few of these failures can be pre-empted by some kind of periodic maintenance. In reference 2, the authors point out that equipment failures can be classified into three general types depending on which part of the equipment system has failed. For a more detailed description of this repair coding system see section 1.6 “What causes equipment systems to fail?” of the website’s HTM ComDoc 1.

Compiling the data into organized batches

To streamline the reporting part of the project the Task Force will be asking certain organizations to volunteer to act as data aggregating intermediaries. Organizations that are candidates for this data aggregator role include independent service organizations (ISOs), national or regional hospital systems with their own in-house maintenance services, and computerized maintenance management system (CMMS) companies. For a more on this see section 7.5 “Guidelines for compiling the data into organized batches” of the website’s HTM ComDoc7.

Why we need a community-wide database

Collecting sufficient data to provide a statistically valid body of evidence to support the use of particular maintenance strategies may prove to be difficult for many individual healthcare facilities, for the following reasons:

  • Because they are designed and constructed differently, different manufacturer-model versions of items found to be PM Priority 1 devices - such as defibrillators and critical care ventilators, will very likely display different levels of reliability. This means that each of the different manufacturer-model combinations will have to be analyzed separately.
  • Devices that are found to be PM Priority 1 are presumably designed to be quite reliable, so they will likely demonstrate a correspondingly low failure rate. This reduces the number of failures that an individual facility will be able to document over a reasonable time period.
  • Many individual healthcare facilities will have only a small number of the different manufacturer-model versions of the device types that have been found to be PM Priority 1.

To illustrate this dilemma, suppose that a facility has only three similar (same manufacturer–same model) heart-lung units and only three years of maintenance history for each unit. Since this amounts to an experience base of only 9 device-years, it is unlikely – if the actual mean time between failures (MTBF) of the units is greater than 9 years (and we are hoping to find that the MTBFs for typical high reliability devices will be at least 75 years) - that the facility will have experienced even one single failure during the 3 year testing period. In this case the facility would have to report its finding with respect to the devices’ indicated failure rate (zero failures over 9 device-years) as undetermined.

And even if they did experience one or more failures during this relatively short period of time, the indicated MTBF (of up to 9 years) will appear to be unacceptably short for a device that is potentially a PM Priority 1 device. With an indicated MTBF this low it would obviously be prudent for the facility to look to the findings on the reliability of these specific types of device in the national database to see whether or not their particular experience is indeed typical (and that this version of this type of device is, in fact, not sufficiently reliable), or if their experience is atypical. For more on this possible situation see references 4 & 5.

The bottom line on these statistical validity considerations is that many individual facilities will probably have difficulty generating enough failure data to get a good indication of each device’s true PM-related failure rate and, therefore, the device’s true level of maintenance-related safety. To get accurate indications of the true PM-related failure rate of PM Priority 1 devices it will be necessary to create a pool of maintenance statistics containing a certain minimum number of device-years of experience for each manufacturer-model version of each device type (see Table 3).

Table 3. Tentative characterizations of different amounts of data in the experience base

Amount of data (device-years)

Inadequate <50
Good 50-200
Very good 200-500
Substantial >500

Aggregating the data

With the recent initiation of this AAMI-supported RCM project we are appealing to every member of the Healthcare Technology Management Community to provide the Maintenance Practices Task Force with summaries of the documented findings from their on-going maintenance on all devices that the Task Force has classified as potentially PM Priority 1 (See Table 1). To allow the findings to be properly aggregated, it is also very important that the maintenance, testing and reporting be performed in accordance with the guidelines described in Sidebar 4. Proof Tables in the proposed community database The key part of the website-based community database will be a set of tables that the Task Force is calling Summary Proof Tables. These tables will catalog the PM-related failure rates calculated from the aggregated maintenance data submitted for each of the different potentially PM Priority 1 device types. The format of this table is illustrated in Table 4. Note that this particular table contains only hypothetical data and it is provided to illustrate the kind of useful information that this project will make available to the entire community.

The data displayed in the table is relatively simple. The MTBF for the device restoration-related failure rate shown in column C4 is derived by adding together the number of any reported PM-related device failures and the number of PM Code 9 Failures found during the reporting period. The MTBF for the safety verification-related failure rate in column C8 is derived from the number of PM Code F Failures found during the same period.

==Table 4 Summary Proof Table for Critical Care Ventilators See attached file/ Table 4 Proof Table 3-31-16.docx==

The kind of useful information that the Summary Proof Tables will provide Table 4 illustrates how this project will enable the HTM community to present solid empirical evidence for which manufacturer-model versions of the various potentially PM Priority 1 devices should be designated PM Priority 1. Generally speaking all devices will usually exhibit different levels of reliability and risk when they are maintained at different intervals; and a device that exhibits an unacceptably high risk of a serious outcome when it fails from a PM-preventable failure will usually exhibit a lower, more acceptable level of risk when the PM interval is reduced.

Once this information begins to become available it will no longer be necessary to guess at what the “safe” PM interval should be. The answer will be apparent from the numbers in the Summary Proof Tables. We will have to see, but it may transpire that all of the manufacturers recommendations are correct, and it may transpire that some will need to be modified. (See reference 6 ). There are still one or two issues needing further work and eventual resolution, such as the thresholds for acceptability for the size of the experience base (Table 3) and what should be used as the acceptable level of safety (Table 2). The Task Force is addressing both of these issues and their current positions and conclusions can be found in the relevant documents on their website. Some illustrative commentary on the hypothetical data in Table 4

  • According to the illustrative PM Findings shown in rows R1 and R2, it appears that the Carefashion company’s Velour model of critical care ventilator behaves like a PM Priority 1 device when it is maintained at a 12-month interval. As can be seen from the data in row R2, the frequency of device restoration-related (DR) failures is higher than the acceptable limit (with an MTBF of only 13 years) at this interval. Whereas the data in row R1 shows that the frequency of DR-related failures (with an MTBF of 107 years) is comfortably below the acceptable limit (with an MTBF tentatively set at 75 years) when the same devices are maintained at the recommended 6-month interval. This provides good empirical evidence that the recommended PM interval of 6 months provides an adequate level of safety, and the 12-month interval is too long.
  • In contrast to this, the PM Findings data for the Coyote company’s Model 300 ventilator shown in rows R3 and R4, show that this device demonstrates an acceptable level of PM-related reliability and safety even when it is maintained at a longer interval than the manufacturer-recommended 6-month interval.
  • Yet another pattern is seen with the PM Findings data for the Herman company’s Uno model ventilator shown in rows R5 and R6. While it shows acceptable PM-related reliability and safety when it is maintained at a 6-month interval, it shows an unacceptable frequency of performance/safety problems (with an MTBF of only 43 years) when it is maintained at the longer 12-month interval - in addition to an unacceptable frequency of device restoration-related failures (with an MTBF of 23 years). Based on this empirical evidence, this particular model should be classified as a PM Priority 1 device when it is maintained at the 12-month interval.
  • A different and more concerning pattern is seen with the PM Findings data for the Matelot company’s Model 500 ventilator shown in rows R7 and R8. The data demonstrates unacceptable levels of PM-related reliability and safety at both maintenance intervals - apparently related to the poor reliability of the device’s non-durable parts. Based on this evidence this particular model should be classified as a PM Priority 1 device when maintained at either interval, and consideration should be given to using a shorter (maybe 3-month) PM interval.

In Table 5.3HE on the website, clicking on the (active) link in the first field of the table (row R1 column C1) will take you to a Detailed Proof Table (Critical care ventilator/ Carefashion-Velour/ 6 months), which shows all of the separate (again hypothetical) batches of data that are aggregated into the summary data shown in each row of the illustrative Summary Proof Table. Periodic reviews of the findings The project plan calls for the members of the Task Force to make regular reviews of the aggregated findings as they are posted on the website and to provide their collective informed judgments on:

  • The adequacy of the sample size and experience base for each manufacturer-model version at each maintenance interval, and
  • Whether or not the indicated levels of PM-related safety are acceptable and truly representative.

Recommendations for optimizing the periodic maintenance of all medical devices

A.For all PM Priority 1 devices that require periodic restoration.

These are potentially hazardous devices with outcomes that could cause a serious or life-threatening patient injury and that also have relatively high PM-related failure rates. For these kinds of devices it would clearly be prudent (even in the absence of any regulatory mandates) to follow, at least initially, the manufacturer’s recommended PM interval. It is impossible to judge how conservative the manufacturers’ recommendations for the device restoration intervals might be for each device. It is also unlikely that there will be any consistency in any included “safety factors” from device to device and from manufacturer to manufacturer. It remains to be seen how the aggregated findings from actual testing will compare with the recommendations. If the device is also found to have hidden failures with serious or life-threatening outcomes that real-world testing shows to have failure rates with MTBFs less than the thresholds of acceptability - 50 years for serious injuries (LOS 2) and 75 years for life-threatening injuries (LOS 3) - then it would be prudent (even in the absence of any regulatory mandates) to perform the safety testing at the manufacturer-recommended interval.

B.For all PM Priority 1 devices that do not require periodic restoration.

These are also potentially hazardous devices with outcomes that could cause a serious or life-threatening injury and that have relatively high PM-related failure rates. For these kinds of devices for which the only “maintenance” that the manufacturer recommends is periodic safety testing, then again it would be prudent (even in the absence of any regulatory mandates) to perform the safety testing at intervals no longer than the manufacturer’s recommendation. When testing for possible hidden failures with high-severity outcomes there is no optimum - shorter is always better. However, it has been shown elsewhere (reference 1) that for safety verification-related failures with MTBFs greater than about 50 years the increase in the time that the patient would be exposed to potentially hazardous hidden failure(s) if the testing interval were increased from 6 months to as long as 5 years is very small.

C.For all PM Priority 2 - 5 devices that require periodic restoration.

The logical rule here (in the absence of any regulatory mandates) would be to explore extending the interval until there is evidence that it has become too long because the device is breaking down for lack of timely restoration. At this point, if there is a desire to eliminate the failure caused by the lack of device restoration, the interval should be moved back to the last interval where the device is no longer breaking down for lack of attention. For all practical purposes, there is no disadvantage in testing for hidden failures at the same interval as is used for the device restoration tasks.

D.For all PM Priority 2 - 5 devices that do not require periodic restoration.

For these kinds of devices in which the only “maintenance” that the manufacturer may recommend is periodic safety testing, and the PM-related failures have been found to occur relatively infrequently (with MTBFs greater than the respective thresholds of acceptability – which is 75 years for LOS 3 devices, 50 years for LOS 2 devices, and 25 years for LOS 1 devices) – then, in the absence of any regulatory mandates, there is no logical justification for performing anything more than occasional safety testing to confirm the previously established level of PM-related reliability.

E.For all non-critical devices.

By definition there is absolutely no safety downside to these devices failing and, according to the RCM methodology (and in the absence of any regulatory mandates), unless there is a convincing case that periodic PM can be cost-justified, all non-critical device types are excellent candidates for the very cost-efficient light maintenance (run-to-failure) strategy. In the civil aviation business it was by adopting this run-to-failure maintenance strategy that they were able to reduce the industry’s maintenance costs by 50% - which also (amazingly) improved the reliability and safety statistics for civilian aircraft by a factor of 200 times! (See reference 1)

Regulatory compliance

We have made no attempt in this article to address the issue of dovetailing these concepts with the requirements in the current standards and regulations. This would be an excellent topic for a companion article by others with current expertise in regulatory compliance.

A final cautionary note

Patient and staff safety has long been the primary justification in the medical device field for performing routine PMs on the hospital’s front line patient care equipment. Regular PM has also become a deeply rooted symbol of institutional caution and caring. If the equipment doesn't look well cared for, what does that imply about how well we take care of our patients? The intent of this article is to address in some detail some apparent misunderstanding about how much regular PM contributes to keeping modern medical equipment safe. If this analysis is accepted as supporting a reduction in planned maintenance we urge that careful thought be given to replacing those services with more efficient and less technically intensive alternative routines (such as Department Rounds) to ensure that the clinical staff still has confidence in the equipment and that it still looks well cared for and ready to do its job.

More detailed discussions of this and all of the other topics mentioned in this article can be found in various documents posted on the Task Force’s website (Sidebar 2)


  1. Ridgway, M.; “Introduction to Reliability-centered Maintenance: the Modern Approach to Planned Maintenance” Chapter 10, A Practicum for Healthcare Technology Management; The Association for the Advancement of Medical Instrumentation, Arlington, VA: 2015 (HTM ComRef 26.)
  2. Ridgway M, Atles L, and Subhan A; "Reducing equipment downtime; a new line of attack"; in Journal of Clinical Engineering; 34: 200-204; 2009 (HTM ComRef 8.)
  3. Wang B, Rui T, Balar S; "An estimate of patient incidents caused by medical equipment maintenance omissions" in BI&T; 47:84-91; Jan/ Feb 2013
  4. Ridgway M and Fennigkoh L; "Metrics for equipment-related patient safety" in BI&T; 48:199-202; May/ June 2014 (HTM ComRef 16.)
  5. Ridgway M and Lipschultz A; "Final Word: Doing it by the numbers" in BI&T; 48:72; Jan/ Feb 2014 (HTM ComRef 15.)
  6. Ridgway M; "Manufacturer-recommended PM intervals: Is it time for a change?" in BI&T; 43:498-500; Nov/ Dec 2009 (HTM ComRef 18.)

Word count = about 7500

Site Toolbox:

Personal tools
This page was last modified 22:35, 7 August 2018. - This page has been accessed 690 times. - Disclaimers - About HTMcommunityDB.org