Reliability and availability
Part 5: Factors impacting availability

High availability is a key measure for many companies in today’s world in order to be competitive and to survive. If a supplier cannot deliver in contractually agreed date (delay) or cannot deliver the agreed quantity, he might face hard financial consequences (penalties, liquidated damages). A great example is the automotive industry. If a small sub-supplier gets in trouble with deliveries, it can easily be devastating for the company. Other examples are maybe less harsh, but still the loss of production output always affects the economic performance of a company.

How is availability defined? The formula is straight forward:

A = MTBF / (MTBF + MTTR)

A … availability,

MTBF … mean time between failure

MTTR … mean time to repair

Variable frequency drives are used in various industrial segments and in most cases availability matters to the end user. So how can high availability be achieved? In our opinion there are two basic principles, both equally important:

A) Mitigation of failure –> reliable design (MTBF)

B) Rectification of failure –> quick and sustainable (MTTR)

To maximize the availability both principles shall be used and continuously improved.

1. Reliability of the equipment

High reliability is of course essential to reach high availability. If the equipment never fails then the availability is 100%. Reliability aspects have already been discussed in previous posts of this series. Factors like robust design, internal margins, extensive field experience, application know-how, strict quality standards for sub-suppliers, manufacturing and testing or well thought protection concept are some of the key items to mention. Okay, reliability is clear. What are the other factors?

We must admit that the equipment might fail. The questions are:

– How much time is needs to identify the failure? How easy/difficult is the troubleshooting?

– How severe is the failure? Can the repair be done on site? Or does the equipment need to be transported into a service workshop or even back to the manufacturer? Does the manufacturer have any service workshop in the region (own workshop or partner’s workshop)?

– Is there a skilled personnel around to check the equipment? Can the operator do the troubleshooting and replacement by himself or must a service engineer by delegated by the manufacturer? How easy it is to reach the service department of the manufacturer? Is there a supportline available 24 hours a day? How much time it takes to the service expert to reach the site? Can the service expert connect to the device remotely to accelerate the troubleshooting?

– Are the spare parts available? If not, what is the lead time? How quickly can the spares be expedited and shipped?

– How service friendly is the equipment? How accessible are the components to be replaced?

– How sustainable is the repair? Is the failure mechanism well understood? What is the risk that same failure happens again? How can the same failure mode be prevented in the future? What is the lesson learned?

2. Identification of failure

When the equipment fails, the time to identify (isolate) the failure is important as availability is about time and the clock starts ticking as soon as the equipment stops. Good software architecture and especially a history of last events before trip (‘black box’ data) can be very helpful. The more intelligent software the faster you might identify the failure. Many trips could be triggered by an external event (e.g. voltage dip in the grid, temporary overvoltage etc). If no component was damaged and the trip conditions were rectified, the VFD can be restarted. If there are hardware damages, it is important to know which components are effected. VFDs might have health check routines that quickly identify the failure. Advanced features might even visualize where the failed component is located (in specialized diagnostic software tools or directly in the HMI).

Remote diagnostics have potential to significantly accelerate the identification of failure and reduce the troubleshooting time. Especially at sites with difficult accessibility this feature has an added value.

3. Service personnel and service network

To rectify the issue as quickly as possible a well trained service personnel is essential. If this person is directly the operator of the end user then some precious time can be saved. Precondition is a proper operation & maintenance training including practical exercise. Such a training is a very good investment, especially for critical applications (e.g. without redundancy, directly impacting the production output). If a service person from manufacturer or a partner service organization is required, then it takes more time. Is there a service unit in the region? Then the reaction time is usually short. Is the service person coming from abroad? Well, this will certainly take longer time. If the site is in a country where most nationalities need a visa to get in then a local service is definitely an advantage. You surely don’t want to wait few days until the service specialist is allowed to travel.

4. Severity of the failure

Next point is the severity of failure/degree of damage. Did an auxiliary board fail? This can most likely be easily replaced by the operator assuming a spare is available on stock or can be quickly shipped. Did a power semiconductor fail? Also this replacement can be done in quick and efficient manner if the operator is well trained and the VFD is designed for an easy replacement. Was the converter damaged by an internal short circuit or electric arc? Well, that is likely a severe failure where multiple components got damaged. It requires a deeper inspection. The outcome might be that a complete VFD needs to be replaced. If it is a tailor made unit that is not stored anywhere on stock the end user would face a big issue. General rule is that VFD protects itself against all possible hazardous events and reduces the degree of damage to an absolute minimum. An example is the arc resistant design. While personal safety is clearly the no.1 priority, it also makes a difference if the hardware gets fatally damaged (but remains safe, i.e. damaged in a controlled way without putting personnel at risk) or if it reacts in a way that hardware damage is minimized as well.

5. Availability of spare parts

In order to repair the failure quickly it is always an advantage to have some spare parts locally on stock. We know that having a material stock is not popular these days. However, even if the VFD is still under warranty and any failed component is replaced free of charge by the manufacturer it might still be a good choice to have at least a minimum set of spares on site. If time matters, such stock of most common spares is highly recommended. The manufacturer shall guide you what spares make sense to store on site. Most manufacturers shall have a pre-defined packages for their products. Consider also if the manufacturer has a central or local warehouse with spare parts and how quickly these can be shipped to site. Naturally, if you are well connected with the surrounding world you can rely a bit more on the manufacturer and hi service network. If you are in remote area that is difficult to reach it is better to have such things more under your own control.

6. Service friendly design

Once the failed component is identified, the spare part is available and the person to do the job is present the replacement can start. Well designed VFD shall allow a reasonable access for the service person. Is front access sufficient? Can you easily remove and replace the part? If you need to move the whole VFD cabinet to access certain components or if you need to remove several healthy components just to access the faulty one then the replacement will be time intensive negatively impacting the availability (Mean time to repair, MTTR).

Note that a compact VFD design and minimized footprint are frequent requirements of the customers. Many VFDs are benchmarked based on their dimensions. The motivation is clear. However, please don’t forget to consider the service access. VFD with very small footprint looks very attractive, but if the price for it is a restricted service access, there is a potential risk that you take. What if one day something happens? According Murphy’s law the least accessible component would fail…

Like almost everything also VFD design is not black and white and you need to find a balance between the requirements that might be in conflict. We know from own experience a VFD model that was originally extremely compact, but not so much service friendly and later re-designed for better access and consequent larger footprint. As you can imagine, there are critical voices for both variants. It is probably not possible to satisfy everyone, but  pragmatic people find a compromise that works for them.

Service friendly design is more than just access to components inside the VFD cubicle. The replacement shall be doable easily, without risk of further damages due to improper handling. The connections shall allow quick isolation of faulty part.

7. Sustainable repair

For long term high availability it is not enough to just solve the failure as a hot fix. The solution shall be long term and sustainable. You don’t want to fix the issue and face the same issue few days later. A root cause analysis might be needed to identify the source of the issue and prevent a repetitive occurrence.

8. Predictive maintenance

You might say that it is part of the digitalization hype. And there is a bit of salt on it. On the other hand, predictive maintenance surely has a potential to detect abnormalities in early stage before they develop into a failure. The predictive maintenance is as good as smart the algorithms are. Some indications are straight forward, such as e.g. increased temperature. Other indicators are less obvious and more brain power is needed to detect them.

This last part about predictive maintenance is a topic on its own and will be discussed in our next blog post in this series.

What to take away? High reliability and availability is a philosophy that shall be implemented consistently and consequently. Reliable design is one large part of it (–> MTBF part in the equation). Other large part is the rectification of the failure (–> MTTR part of the equation).