WHILE MANY debate academic concerns regarding software complexity and reliability, it is clear that the motor control industry is capitalizing on the flexibility and low apparent cost of using embedded software. When this embedded software impacts the functional safety of the embedded system, risk must be managed with due diligence and in accordance with current industry practice or state-of-the-art.
Since consensus standards represent current accepted practice and a baseline for state-of-the-art, they provide a good framework to direct risk management activities. While some fundamental philosophical differences might exist among these emerging standards, there also are many similarities that demonstrate the universality of the present approach to software verification and validation activities (V&V) for the functional safety of motor control systems.
While the word risk has many connotations depending on its technical, legal, social, or philosophical context, it is a concept that we all address every day. Risk is, for many, the motivating force behind conscious or subconscious decision-making. Understanding risk and the tools available to manage it can be invaluable to engineers striving for reliable, robust, and cost-effective designs.
Risk is the combination of the severity and probability of an event (hazard) that has the potential to negatively impact people, assets, potential assets, or the environment . While software risks theoretically can be segregated as technical risks and business risks , these two domains frequently overlap when considering issues such as delayed market entry due to emergence of technical hazards (for example, undetected software anomalies) during the development process.
The scope of this article focuses on operational risk, with respect to the functional safety of embedded system software.
In this context, functional safety is defined in IEC 61508, the standard for Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems, as part of the overall safety relating to the EUC [equipment under control] and the EUC control system which depends on the correct functioning of the E/E/PE [electrical/electronic/programmable electronic] safety-related systems, other technology safety-related system and external risk reduction facilities.
UL/IEC 60730, the standard for Automatic Electrical Controls for Household and Similar Use, refers to the same concept through the definition of a protective control as a control, the operation of which is intended to prevent a hazardous situation during abnormal operation of the equipment. This standard also requires that control functions be classified as either Type 1 or Type 2 action.
Type 1 action controls perform functions in which deviation and drift of control parameters will not introduce a hazard to either the component (control) or the system (end-product where the control is installed). By contrast, Type 2 action controls have control parameters with critical deviation and drift characteristics, so exceeding design constraints such as limits or tolerances could result in a risk.
So, when evaluating functional safety, its crucial to properly identify the critical control parameters and understand the related component and system-level failure modes. Existing end product industry-consensus standards might help identify these safety-related parameters.
Availability vs. Reliability
One of the most significant differences between UL/IEC 60730 and IEC 61508 is whether the system can be designed to fail safe or fail operational.
UL/IEC 60730 allows consideration of fail-safe design, with the possibility to immediately shut down the operation. IEC 61508 provides mechanisms for ensuring increasing levels of operational integrity of the safety function, depending on the established Safety Integrity Level (SIL). The SIL is a discrete level--one out of a possible four--for specifying the safety integrity requirements of the safety functions to be allocated to the E/E/PE safety-related systems. SIL 4 has the highest level of safety integrity; SIL 1 has the lowest.
IEC 61508 uses this SIL concept to prescribe software and electronic design considerations that can satisfy the SIL requirements. UL/IEC 60730 is not a standalone document. It is made up of a Part 1 with general requirements, which for many application domains, are referenced by Part 2s, i.e, particular requirements for those specific application domains. With this structure, UL/IEC 60730 relies on the system requirements prescribed in the relevant end-product standards to establish operational characteristics such as response times, functional availability, and tolerances that often are under the control of the embedded software. Reliability of this software typically is one of the most important facets of assessing control system functional safety. Reliability can be considered the probability that a system or product will accomplish its designated mission in a satisfactory manner . Thus, while software can be validated at particular instants in time, the verification activities can positively influence the stochastic aspects of system reliability over time.
Possibly the most important software verification activity for functional safety design is risk analysis. Due to the highly subjective nature of what constitutes acceptable levels of risk, neither UL/IEC 60730 nor IEC 61508 mandate specific methods for risk analysis.
Here are five examples of possible risk analysis methodologies:
- Fault Tree Analysis (FTA) identifies potential causes of hazards. It was developed at Bell Telephone Laboratories in 1964, initially for the aerospace, electronics, and nuclear industries. It is a top-down approach consisting of system definition, fault-tree construction using logic gates and events, qualitative analysis, and quantitative analysis. Its drawbacks are that it requires foreknowledge of the system, and requires caution to avoid oversight of critical paths due to simplification of system representation.
- Event Trees are used extensively in business and economics. The method has advantages over the Fault Tree approach because it breaks the overall complex system into smaller, more manageable parts. An Event Tree is drawn from left to right, with the branches corresponding to the alternatives of successful performance of the safety function or failure of the safety function. Similarly to the Fault Tree, a probability can be assigned to each alternative and translated throughout the critical thread. Potential problems with timing issues and effects of common-cause failures on probability dependencies should be considered during the evaluation of this method.
- Failure Modes, Effects, and Criticality Analysis (FMECA) helps analyze discrete failures. It has been used extensively in the reliability engineering community since it can establish the overall probability that product will operate without failure for a specific length of time or for a specific length of time between failures. This variation supplements the traditional FMEA by introducing the concepts of risk and residual risk by considering probability in the estimation of criticality. This technique is comprehensive, but can be burdensome because of the need to exercise each failure mode of the device under evaluation.
- Cause and Consequence Analysis (CCA) starts with a critical event and uses a top-down approach or backward search to determine the cause and potential consequences. Interrelationships are graphically represented using gates to describe relationships between cause events and vertices to describe relationships between consequences. CCA diagramming can be unwieldy because it requires an individual diagram for each initiating event.
- Hazard and Operability Analysis (HAZOP) is a qualitative technique to identify deviation from expected operation and the hazards associated with such deviation. Under ideal circumstances, this technique can identify and/or eliminate a great number of hazards. However, the ideal circumstances rely heavily on the experience and judgment of the engineers performing the analysis .
As mentioned, these methods are not specified in UL/IEC 60730 or IEC 61508. They serve, instead, as examples of risk analysis techniques that can help identify the safety functions to be addressed. The manufacturer has the responsibility to conduct such analyses prior to a third-party assessment for conformity to either standard. While these standards leave the risk analysis approach to the discretion of the manufacturer, they provide guidance about the generalized sources for failure modes of the electronic hardware and software to be addressed during the risk analysis.