Chapter 1
Troubleshooting for Fun and Profit
Process machines are critical to the profitability of processes. Safe, efficient and reliable machines are required to maintain dependable manufacturing processes that can create saleable, on-spec product on time, and at the desired production rate. As the wards of process machinery, we wish to keep our equipment in serviceable condition.
One of the most challenging aspects of a machinery professional or operatorâs job is deciding whether an operating machine should be shut down due to a perceived problem or be allowed to keep operating and at what level of operation. If he or she wrongly recommends a repair be conducted, the remaining useful machine life is wasted, but if he or she is right, they can save the organization from severe consequences, such as product releases, fires, costly secondary machine damage, etc. This economic balancing act is at the heart of all machinery assessments.
The primary purpose of this guide is to help operators and machinery professionals troubleshoot machines that are in a process service and operating at design process conditions. The reader may ask: What is the difference between field troubleshooting and other analysis methods such as a root cause analysis, failure analysis, and a root cause failure analysis?
Consider the following definitions:
Field troubleshooting is a process of determining the cause of an apparent machine problem, i.e., symptom, while it is still operating at actual process conditions. Troubleshooting efforts tend to focus on a specific machine or subsystem, using a proven body of historical knowledge. The body of knowledge may be in the form of troubleshooting tables and matrices or manufacturerâs information. Keep in mind that process machinery can only truly be tested and evaluated in service and under full load, i.e., in-situ. Very few testing facilities are available that can test a pump or compressor at full process loads and with actual process fluids. Field troubleshooting evaluates the mechanical integrity of a machine in process service in order to determine if symptoms are the result of an actual machine fault or a process-related problem.
Here are examples of troubleshooting opportunities:
Example #1: Pump flow has fallen well below its rated level.
Example #2: Compressor thrust bearing is running 20 °F hotter than it was last month.
Root cause analysis (RCA) is a broad analysis of a system made up of multiple components or subsystems or an organization made up of multiple processes. These complex systems may not have any historical failure information to reference and are not well understood. The overall complexity may require that the overall system be broken down and analyzed separately. Here are two examples of RCA opportunities:
Example #1: The finished product from a process unit went out of spec.
Example #2: Plant XYZ safety incidents for the month of May have doubled when compared to last yearâs total.
One distinction between RCA approaches and troubleshooting is that RCAs tend to address larger problems that often require a team approach, while troubleshooting can normally be conducted by a single individual. As a general rule, maintenance and operations personnel normally participate more in troubleshooting activities than in root cause analysis activities due to the very nature of their jobs.
Failure analysis is the process of collecting and analyzing physical data to determine the cause of a failure. Physical causes of failure include corrosion, bearing fatigue, shaft fatigue, etc. Failure analyses can only be conducted after a component failure. A failure is defined as a condition when a componentâs operating state falls outside its intended design range and is no longer able to safely, or efficiently, perform its intended duty.
Root cause failure analysis (RCFA) methodology attempts to solve complex problems by attempting to identify and correct their root causes, as opposed to simply addressing their symptoms. The RCFA methodology allows an organization to dig deeper into a failure or series of failures in order to uncover latent issues.
To further clarify the differences between these analysis approaches, we recommend the following line of questioning:
1. The field troubleshooter must first ask: Do I fully understand the machine or subsystem that needs to be analyzed? If the complexity is beyond the troubleshooterâs abilities, he or she should get help. At this point, management may decide to conduct an RCA analysis.
2. If the field troubleshooter decides to tackle the problem at hand, he or she should then ask: âAre the observed symptoms caused by a failing machine, a correctable fault, or by undesirable process conditions?â If it is a process-related problem, changes can be made before permanent machine damage occurs. If a fault is deemed to be correctable, then adjustments or minor repairs can be made in order to quickly restore the machine to serviceable conditions.
If the machine fails, either a failure analysis or root cause failure analysis must be performed, depending on the extent and cost of the failure. The failure analyst asks different types of questions depending on the level of detail desired:
1. The failure analyst asks the question: âWhat is the physical mechanism, or sequence of events, that caused a given component to fail?â If the failure mechanism is clearly understood, perhaps design or procedural changes may be implemented to avert future failures.
2. The root cause failure analyst asks the question: âAre there hidden factors, such as unknown design, repair, operational, and other organizational issues, contributing to the observed machine problems?â If there are latent factors suspected but unidentified, perhaps an inter-disciplinary team can identify key factor or factors and address them to avert future failures.
All these approaches do have some common elements in their respective processes, and the information identified in one can be utilized in the other approaches. These are not necessarily competing activities, but are mutually supportive activities.
Figure 1.1 shows a simple decision tree that can be used to address machinery field problems. (Note: The RCA option is not considered in this chart because we have assumed the problem is confined to a specific machine and is within the troubleshooterâs level of ability.) The troubleshooter begins at the top of the tree when a symptom is first detected. At this point, the troubleshooter assesses the situation and then picks one of the possible path forwards:
1. Do nothing
2. Modify process conditions
3. Adjust machine, i.e., balance, align, or lubricate machine as required
4. Plan to repair
If a repair is deemed necessary, the maintenance organization should then estimate the repair and outage costs. If the total cost (parts and labor) of the failure is less than $10,000, then a repair should be performed without any additional type of analysis. If the total cost of repair is estimated to be greater than $10,000 but less than $50,000, a failure analyses should be conducted on the failed parts in order to understand the nature of the failure. Finally, if the total cost of failure is greater than $50,000, then a root cause failure analysis is justified and should be executed.
The reader should note that the decision tree presented here is only one of many possible tools that can be used to address machinery field problems. Each organization can and should develop its own customized decision tree to satisfy its needs. For example, the cost breakpoints used in this example can be customized to satisfy your organizationâs process and management goals.
The decision tree in Figure 1.1 clearly illustrates that all machine decisions usually begin with some sort of field troubleshooting or assessment effort. Field troubleshooting can therefore be considered a type of âgatekeepingâ step for deciding which machines need to be repaired. If performed diligently and correctly, field troubleshooting can eliminate unnecessary machinery repairs and improve the overall site profitability and operating efficiency.
In the remainder of this book, we will concentrate on explaining a novel field troubleshooting method to those on the front lines and in the position to gather key performance and operating data. By acting quickly, perhaps the underlying problem can be identified, corrected, and the machine may be returned to normal operation in a timely manner. The reader should always keep in mind that field troubleshooting may be the first step in a series of analysis steps if machine conditions continue to deteriorate. This could also be the introduction to the two other analysis methods previously mentioned.
1.1 Why Troubleshoot?
Why should organizations care about field troubleshooting? You might ask: âIsnât that why we have a maintenance department, so they can repair machines that are acting up?â The problem is that not all machines that act up have failed; they may simply be reacting to some external change. Distinguishing between a machine that is just acting up versus one that has failed or is failing is the goal of a troubleshooter.
Letâs consider this simple example: A pump bypass line was inadvertently left open after a start-up. This condition leads to a low forward flow condition. If the pump is overhauled, the same result will be seen, resulting in wasted maintenance dollars and frustration. If a diligent operator would have found the open bypass valve while troubleshooting, it would have been a very rewarding discovery. The subsequent accolades from management would have boosted the operatorâs ego and spurred others to seek future troubleshooting opportunities.
While troubleshooting can be very rewarding and even fun at times, the main reason to consistently utilize a troubleshooting methodology is to add value to the organization. It has been demonstrated that a successful troubleshooting program can reduce machinery repair cost up to 20%. The savings come from:
- Keeping equipment in service that are serviceable and eliminating needless repairs
- Recommending required adjustments, such as balancing, before permanent damage occurs
- Uncovering latent plant issues, such as fouling, flow blockage, etc.
- Judiciously delaying repairs in order to properly plan work and get critical spare parts in stock before serious internal damage occurs
In a nutshell, troubleshooting allows maintenance and operating departments to better manage plant resources by maximizing the run lengths of machines, while avoiding major risks and consequences.
To realize the full benefits of field troubleshooting, all participants must possess adequate machinery experience and knowledge, be properly trained, and approach field problems wholeheartedly and with an open mind. What does being open-minded mean? An open mind means all participants have:
- No preconceived idea to what the problems or solutions are
- No hidden agendas
- Willing to listen to everyoneâs input
Participants with preconceived ideas are often doomed to failure because they are blinded to vital clues as to whatâs really going on. Their nearsightedness will result in a big waste of time and resources. Furthermore, troubleshooting participants with hidden agendas are not being fair or honest to their organizations. Those that believe they are unable to investigate a problem faithfully, fairly, and with an open mind should let someone else in the organization investigate the problem.
1.2 Traits of a Successful Troubleshooter
We probably all know someone that is especially skilled at getting to the root of a problem. Instead of simply changing parts out or âshooting from the hip,â the sk...