WHILE MANY debate academic concerns regarding software complexity and reliability, it is clear that the motor control industry is capitalizing on the flexibility and low apparent cost of using embedded software. When this embedded software impacts the functional safety of the embedded system, risk must be managed with due diligence and in accordance with current industry practice or state-of-the-art.
Since consensus standards represent current accepted practice and a baseline for state-of-the-art, they provide a good framework to direct risk management activities. While some fundamental philosophical differences might exist among these emerging standards, there also are many similarities that demonstrate the universality of the present approach to software verification and validation activities (V&V) for the functional safety of motor control systems.
While the word risk has many connotations depending on its technical, legal, social, or philosophical context, it is a concept that we all address every day. Risk is, for many, the motivating force behind conscious or subconscious decision-making. Understanding risk and the tools available to manage it can be invaluable to engineers striving for reliable, robust, and cost-effective designs.
Risk is the combination of the severity and probability of an event (hazard) that has the potential to negatively impact people, assets, potential assets, or the environment . While software risks theoretically can be segregated as technical risks and business risks , these two domains frequently overlap when considering issues such as delayed market entry due to emergence of technical hazards (for example, undetected software anomalies) during the development process.
The scope of this article focuses on operational risk, with respect to the functional safety of embedded system software.
In this context, functional safety is defined in IEC 61508, the standard for Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems, as part of the overall safety relating to the EUC [equipment under control] and the EUC control system which depends on the correct functioning of the E/E/PE [electrical/electronic/programmable electronic] safety-related systems, other technology safety-related system and external risk reduction facilities.
UL/IEC 60730, the standard for Automatic Electrical Controls for Household and Similar Use, refers to the same concept through the definition of a protective control as a control, the operation of which is intended to prevent a hazardous situation during abnormal operation of the equipment. This standard also requires that control functions be classified as either Type 1 or Type 2 action.
Type 1 action controls perform functions in which deviation and drift of control parameters will not introduce a hazard to either the component (control) or the system (end-product where the control is installed). By contrast, Type 2 action controls have control parameters with critical deviation and drift characteristics, so exceeding design constraints such as limits or tolerances could result in a risk.
So, when evaluating functional safety, its crucial to properly identify the critical control parameters and understand the related component and system-level failure modes. Existing end product industry-consensus standards might help identify these safety-related parameters.
Availability vs. Reliability
One of the most significant differences between UL/IEC 60730 and IEC 61508 is whether the system can be designed to fail safe or fail operational.
UL/IEC 60730 allows consideration of fail-safe design, with the possibility to immediately shut down the operation. IEC 61508 provides mechanisms for ensuring increasing levels of operational integrity of the safety function, depending on the established Safety Integrity Level (SIL). The SIL is a discrete level--one out of a possible four--for specifying the safety integrity requirements of the safety functions to be allocated to the E/E/PE safety-related systems. SIL 4 has the highest level of safety integrity; SIL 1 has the lowest.
IEC 61508 uses this SIL concept to prescribe software and electronic design considerations that can satisfy the SIL requirements. UL/IEC 60730 is not a standalone document. It is made up of a Part 1 with general requirements, which for many application domains, are referenced by Part 2s, i.e, particular requirements for those specific application domains. With this structure, UL/IEC 60730 relies on the system requirements prescribed in the relevant end-product standards to establish operational characteristics such as response times, functional availability, and tolerances that often are under the control of the embedded software. Reliability of this software typically is one of the most important facets of assessing control system functional safety. Reliability can be considered the probability that a system or product will accomplish its designated mission in a satisfactory manner . Thus, while software can be validated at particular instants in time, the verification activities can positively influence the stochastic aspects of system reliability over time.
Possibly the most important software verification activity for functional safety design is risk analysis. Due to the highly subjective nature of what constitutes acceptable levels of risk, neither UL/IEC 60730 nor IEC 61508 mandate specific methods for risk analysis.
Here are five examples of possible risk analysis methodologies:
- Fault Tree Analysis (FTA) identifies potential causes of hazards. It was developed at Bell Telephone Laboratories in 1964, initially for the aerospace, electronics, and nuclear industries. It is a top-down approach consisting of system definition, fault-tree construction using logic gates and events, qualitative analysis, and quantitative analysis. Its drawbacks are that it requires foreknowledge of the system, and requires caution to avoid oversight of critical paths due to simplification of system representation.
- Event Trees are used extensively in business and economics. The method has advantages over the Fault Tree approach because it breaks the overall complex system into smaller, more manageable parts. An Event Tree is drawn from left to right, with the branches corresponding to the alternatives of successful performance of the safety function or failure of the safety function. Similarly to the Fault Tree, a probability can be assigned to each alternative and translated throughout the critical thread. Potential problems with timing issues and effects of common-cause failures on probability dependencies should be considered during the evaluation of this method.
- Failure Modes, Effects, and Criticality Analysis (FMECA) helps analyze discrete failures. It has been used extensively in the reliability engineering community since it can establish the overall probability that product will operate without failure for a specific length of time or for a specific length of time between failures. This variation supplements the traditional FMEA by introducing the concepts of risk and residual risk by considering probability in the estimation of criticality. This technique is comprehensive, but can be burdensome because of the need to exercise each failure mode of the device under evaluation.
- Cause and Consequence Analysis (CCA) starts with a critical event and uses a top-down approach or backward search to determine the cause and potential consequences. Interrelationships are graphically represented using gates to describe relationships between cause events and vertices to describe relationships between consequences. CCA diagramming can be unwieldy because it requires an individual diagram for each initiating event.
- Hazard and Operability Analysis (HAZOP) is a qualitative technique to identify deviation from expected operation and the hazards associated with such deviation. Under ideal circumstances, this technique can identify and/or eliminate a great number of hazards. However, the ideal circumstances rely heavily on the experience and judgment of the engineers performing the analysis .
As mentioned, these methods are not specified in UL/IEC 60730 or IEC 61508. They serve, instead, as examples of risk analysis techniques that can help identify the safety functions to be addressed. The manufacturer has the responsibility to conduct such analyses prior to a third-party assessment for conformity to either standard. While these standards leave the risk analysis approach to the discretion of the manufacturer, they provide guidance about the generalized sources for failure modes of the electronic hardware and software to be addressed during the risk analysis.
Many industry sectors accept, as evidenced by their consensus standards, that software is susceptible to common-cause failures. Faults that lead to failures can arise from specification and/or implementation mistakes, external disturbances, and component defects. These faults can impact the system software, as well as the analog and digital hardware implemented in the firmware. They can be permanent, transient, or intermittent, and they can lead to deterministic or non-deterministic states, depending on the design. Most importantly, they can have local effects isolated to subsystems, or they can impact system functionality globally. While the global effects can have an immediate impact, local effects can be more insidious, particularly if hidden in a safety function called on only when abnormal conditions occur.
MOTOR CONTROL SAFETY
Figure 1 below is an example of motor control safety functionality implemented in hardware, the functionality of which is defined by the embedded software. The basic safety concerns are electric shock and fire. The design specification, standards profile, risk analysis, and system assumption are minimized, incomplete, and only used for purposes of illustration.
FIGURE 1: MOTOR CONTROL ASSESSMENT
The functionality of motor control safety is defined by the embedded software. The basic safety concerns are electric shock and fire. This control is intended for a fail-safe, and the only required safety function is locked-rotor protection to prevent over-temperature conditions that could result in fire or insulation breakdown-related electric shock. These assumptions point to UL/IEC 60730 for the control assessment. (Click the image to view an enlarged PDF of this chart.)
For this example, lets assume this control is intended for use in a residential appliance where a fail-safe state is defined. The second major assumption is that the only required safety function (per an end-product standard such as UL/IEC 60335) is locked-rotor protection to prevent over-temperature conditions that could result in fire or insulation breakdown-related electric shock.
From a standards perspective, these assumptions point to UL/IEC 60730 for the control assessment. UL/IEC 60730 is a harmonizedsafety standard that incorporates Functional Safety requirements and is one of the Standards approved under the Certification Body (CB) Scheme. The test results that are generated by a Certification Body Test Lab (CBTL), with a CB Certificate issued by a National Certifying Body (NCB) such as UL, would be accepted by other such testing organizations throughout the world.
These assumptions also point to the use of UL 2111, Overheating Protection for Motors, since the block diagram indicates that the motor is provided in combination with the control, and a locked-rotor condition would lead to an over-temperature-related trip. Presently, there is no equivalent IEC standard for the evaluation of residential-use motor/control combinations; therefore such a combination would be tested in the end-appliance. These end-appliance standards vary with respect to requirements regarding loss-of-phase, locked-rotor, running-overload, or all three of these stress conditions, based on whether the appliance is remotely or automatically controlled.
The risk analysis for this control could be based on any of the aforementioned methodologies. As an example, to analyze failures associated with the microcontroller in Figure 1 above, we should focus on a few of the common-cause microelectronic faults (as described in UL/IEC 60730 Annex H Table H.11.12.7) that could be included in an FMEA (See Table 1 below).
TABLE 1: USE FMEA TO ANALYZE THE RISK
In this example, the emphasis on loss of temperature sensing is due to the declared safety function relative to thermal protection of the motor. (Click the image to view an enlarged PDF of this table.)
Per UL/IEC 60730, this control would be a protective, Type 2 action control, since deviation and drift associated with the temperature sensing and feedback could compromise protecting the motor from thermally induced insulation degradation, which could result in fire or electric shock. While there are many obvious root-causes of loss of protective functionality, a thorough analysis would be required to uncover even-more-insidious failures such as improper wave-shaping via high-speed IGBT pulsing. This also could lead to a locked rotor-like condition in which the windings are energized but the IGBT synchronization might not be correct (i.e., loss of phase), resulting in the motor becoming essentially an inductive heater.
Another possibility might be a running overload-like condition in which the drive signal, due to a software fault, loses its critical characteristics relative to the impedance model of the motor. This would lead to thermal stress of the insulation system over time. The deviation and drift declarations could even be expanded to include deviation or drift from the prescribed drive-signal waveform. Thus, when making declarations, give consideration to all critical parameters associated with the safety function(s), even if the component under consideration is an algorithm.
Such a control system would be subjected to the same general assessment approaches in UL/IEC 60730 as it would with any other functional safety standard such as IEC 61508 or IEC 61511: the safety functions would be defined, the safety lifecycle, design, and layers of protection analyzed, and both the hardware and software would be tested for robustness against physical, environmental, electrical, and electromagnetic stressors.
Motor controls fall into many different industry sector domains, each with its own regulatory considerations. A single motor-control design could be considered for entry into many markets and sectors, including industrial, process, residential, medical, pharmaceutical, and others. The requirements for each sector could be further complicated by geography and politics.
In addition to traditional conformity assessment services, UL offers many services to help manufacturers gain global market access. UL services can help manufacturers better understand the regulatory issues and related design constraints they might face when entering multiple markets. Awareness of constraints early in the design process can enhance product safety. As designers struggle with the daily challenges of designing a product that operates to customer specifications, such awareness can help minimize redesign and retesting related to regulatory issues. This type of cooperative relationship can ultimately reduce cost and improve time to market for motor control manufacturers.
Some Historical Perspective
ALTHOUGH PROGRAMMABLE controllers did not appear in industrial application until the 1960s , the earliest research on software faults appears to date back to the Electronic Numerical Integrator and Calculator (ENIAC) developed in 1946 by Dr. John Mauchly and J. Presper Eckert at the University of Pennsylvania . They noted that excessive heat led to faults in the more than 18,000 valves (vacuum tubes) used for the execution of the manually wired programs.
Since that time, extensive research has been conducted to understand the potential failure modes of software. The general conclusions of this research have been well encapsulated in this excerpt from Safeware :
In control systems, the computer is usually simulating the behavior of an analog controller. Although the software may be implementing the same functions previously performed by the analog device, the translation of the function from analog to digital form may introduce inaccuracies and complications. Continuous functions can be difficult to translate to discrete functions, and the discrete functions may be much more complex to specify. In addition, the mathematics of continuous functions is well understood; mathematical analysis often can be used to predict the behavior of physical systems. The same type of analysis does not apply to discrete (software) systems. Software engineering has tried to use mathematical logic to replace continuous functions, but the large number of states and lack of regularity of most software result in extremely complex logical expressions. Moreover, factors such as time, finite-precision arithmetic, and concurrency are difficult to handle.
Some of the earliest published standards about functional safety of software in programmable systems have emerged from the U.S. defense and aerospace industries, as well as the process control industry. These standards rely on the concepts of system-safety engineering derived from systems theory, a discipline dating back to the early part of the 1900s.
|FIGURE 2: HEINRICH'S PYRAMID|
Heinrichs Pyramid, 1929
FIGURE 3: P&G STUDY
P&G Study, 1986
Around the same time period, H.W. Heinrich conducted a study of 50,000 industrial accidents and published his findings in 1929. This led to a statistical basis for eliminating hazards using a model of his hypotheses known as Heinrichs pyramid  (See Figure 2). This statistical relationship was confirmed in a study conducted by Proctor and Gamble Corp. decades later in 1986  (See Figure 3).
This understanding of the potential impact of failures in protective functionality carried over to modern risk-based control system standards such as IEC 61508 and UL/IEC 60730. Remember, however, that the actual safety functions and the associated reliability metrics are driven by UL, IEC, and other harmonized industry end-product standards that capture the safety concerns relevant to the given application domain of the control system. This allows issues such as physical environment, electrical environment, user competency, and other application-specific concerns or mitigation mechanisms to be considered when assessing control system functional safety.
 Gruhn, P, Safety Shutdown Systems: Design, Analysis, and Justification, Instrument Society of America, Research Triangle Park, N.C., 1998.
 Karolak, D.W, Software Engineering Risk Management, IEEE Computer Society Press, Los Alamitos, Calif., 1996.
 The Illustrated Science and Invention Encyclopedia, Volume 5, H.S. Stuttman Inc, publishers, Westport Conn., 1983.
 Leveson, Nancy, Safeware- System Safety and Computers, Addison-Wesley Publishing, Inc., 1995
 Bezane, Norm, This Inventive Century, Underwriters Laboratories, Inc., Northbrook Ill., 1994.
 Downloaded from www.cbs.state.or.us/external/osha/ppt/100oh.ppt
 Blanchard, B.S, Systems Engineering and Analysis, Prentice Hall, Upper Saddle River, N.J., 1998.
 Downloaded from http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1483
 International Electrotechnical Commission (IEC), IEC 61508 Parts 1-7, Functional Safety of Electrical / Electronic / Programmable Electronic Safety-Related Systems, First Edition, IEC, 3, Rue de Varembe, Geneva, Switzerland, 1998.
|About the Author|