By George A. Peters and Barbara J. Peters
The world around us constantly changes, sometimes unexpectedly, dramatically, and personally. Such a professional disruption is now occurring to engineers in terms of design safety, including newly enacted regulation of substances deemed hazardous. There are also liabilities that rest on the manufacturer, and by extension on the engineer, in safeguarding property and people when products are used and even when they are abused.
The common practice for design engineers to achieve an acceptable level of system or product safety has been the recognition and utilization of past historical knowledge of customer needs and experiences. A little extra comfort and reserve was found in the use of safety factors to compensate for foreseeable stresses, variable loads, repeated shocks, wear, corrosion, abuse, accidental damage, and other uncertainties.
There is now a recognition of some changing marketplace expectations, specific modifications in contractual requirements, and advantages from ongoing technological advances. There has been general conformance to codes, rules, regulations, and an ever-increasing number of trade and harmonized international standards.
A reappraisal of design safety is under way because of competitive international trade, the rapidity of technological advances, and more open expressions of social and legal demands. It has been sharpened by applications called safety critical, projects referred to as having unconditional safety objectives, and efforts containing requisite error-proofing.
Individual professional responsibilities, as well as challenging opportunities, have broadened and deepened. There is greater recognition that higher levels of safety are now achievable, relatively economical, and more desirable under most circumstances.
At an earlier time, a focus on the appropriate functioning and improvement of the design itself consumed virtually all of the design engineer’s attention, time, and energy. Gradually, consideration was given to external safety, those factors that became important beyond the basic machine function. The need for human operator safety resulted in “safety features,” “safety accessories,” and “safeguards.”
These add-ons varied as to the type and magnitude of the emerging injuries. They included shields, barriers, perimeter guards, interlocks, lockouts, limit switches, proximity switches, light curtains, safety mats, access limiters, safety relays, and two-hand controls. They were usually thought about as the user and owner’s responsibility, not the design engineer’s. There was some improved safety robustness, but there were significant vulnerabilities as technological progress and machine complexity continued onward.
Consideration of human factors resulted in recommendations to keep work tasks simple, make the process inherently easy, reduce the workload, avoid distractions, enhance situation awareness, assure human diligence, reduce attention requirements, improve communications, and enhance man-machine interactions.
There were also calls for better operator training, instruction on supervision, motivation, and the creation of a positive safety culture. All of these responsibilities were broadly conceived countermeasures to errors, mistakes, or poor production that could be considered as primarily employer responsibilities.
The demand for higher levels of safety has resulted in the formulation of objective analytic design techniques to identify hidden hazards and potential problems, determine design countermeasures and remedies, and assess and categorize residual risk. The specific techniques vary, but include fault tree analysis; failure mode, effects, and criticality analysis; risk assessment; and many more techniques. They are incorporated in national and international standards, trade standards for various industries, contractual requirements and specifications, and mandatory government regulatory requirements. They provide more management and customer assurance that there has been critical insight, potential debugging, and safety robustness considerations.
Many fairly recent design safety concepts have become familiar in terms of design objectives and in component or assembly choice. For example, redundancy may be conceived as parallel pathways or duplicate means to achieve an overall product performance goal. However, it may pertain to only a small segment of a circuit, hardware element, or software program to compensate for uncertainty or to achieve a desired level of system reliability. It is often a pathway around unexpected failure: for example, when using commercial-off-the-shelf parts instead of custom application specific parts (such as ASIC).
Thus, overall robustness may or may not be enhanced by redundancy. Similar advantages and limitations, cost and benefits, and robustness factors should be critically considered for full or partial tandem, backup, backstop, high reliability, or safety critical equipment.
A single machine fault, malfunction, or failure may not result in any unwanted harm to person, property, process, business, or the environment. But, this is too often a quick subjective assumption that leads to unexpected problems. The acceptance of such no-harm single point failures should rest on historical data plus a hazard analysis of component interactions, including multiple concurring, serial, or cascading failures, interactive effects on independent functions, and the possible alteration of protective systems functions.
The default position should be that each identified hazard, when triggered, may cause harm. For example, one fault or behavioral act may cause harm if the interposed remedies or relevant countermeasures do not function when and where needed. The residual risk may or may not be classified as low as reasonably practicable or low enough to avoid intolerable risks.
One, Two, Three
One safety assurance approach is to use a fail-safe concept, generally by designing equipment to tolerate a single fault condition. That is, single failure tolerance means that the equipment will cause no safety event when operated in a normal or a single fault condition. A double fault must occur before the fault-tolerant product manifests a safety event.
Caution should be exercised because fail-safe may refer to the characteristics of a single component, some part of a circuit or assembly, or a core product performance. A fail-safe (fault tolerance) condition is different from the automatic shut-down or cutoff that can occur during a single fault. For example, in a shutdown, a machine may be rendered inoperative by a single behavioral act, such as removal of fingers, hands, or feet from dead man’s switches or vigilance buttons. In a fail-safe condition, normal operation would continue despite the single act or fault.
High fault tolerance may be necessary in safety-critical, life-critical, or mission-essential conditions. The equipment may be tolerant of two or more failures while retaining normal operation. This rule-of-two (two failure tolerance) may require two equipment malfunctions, two independent human errors, or one error and one fault to create a “dangerous operating condition.” A rule-of-three may be prudent when human operators must make difficult decisions, perform complex tasks, engage in teamwork, deal with unfamiliar logic, interpret noisy communications, take novel action pathways under unexpected external perturbations, or to actuate certain potentially harmful medical equipment in life-threatening conditions.
All of the consequences of the actuation of a protective mechanism or safety device should be explored. Since other components may be affected and may produce subsequent secondary faults or errors, normal operation may be too easily resumed (reset) when not desirable, or over time the mechanism may become inactive or unreliable without any indication of that status. Issues might include over-voltage or over-current releases that open circuits or rely on impedance control, blocking capacitors, current fluctuation, short circuit detection, motion overload slip, or quick acting resettable fuses.
There may be other safety protective features for the machine or for the operator or for both. Large or small, simple or complex, there is a growing variety of devices, features, or mechanisms to enhance safety. Aircraft safety signals include a computer generated (voice) warning “not working correctly” for the radio-altimeter and a stick-shaker stall warning. Automobiles may have designed-in audible driver assistance alerts for lane departure. Highways may have raised lane edge projections (RPMs, Botts’ dots, or rumble strips) for a similar purpose. Construction equipment may have reverse signal alarms, work platforms may have tilt alarms, and cranes may have load limit audiovisual alarms.
One basic example of a governing premise for design safety is as follows: A given behavioral act, machine process event, electrical response, or pattern of activities may be detected, sampled, and identified with appropriate sensors, circuits, software memory, and hardware. This output may be compared to a known, stable, and meaningful reference element, signal, or code.
The actual event or behavior (output) compared to the desired reference (expectation) yields a control error signal. This signal can be utilized in a control function to stop operation if the error magnitude exceeds an acceptable limit. It may modulate the control if slightly beyond the acceptable limit. It may approve a within-limit output and do nothing. It may signal a reset, so the human operator is advised of a possible error and requested to try again. Such premises provide valuable design guidance and uniformity in company design manuals for cumulative experience retention purposes.
Some safety systems may introduce complexity. For example, what may be called a safe-stop condition might be a partial machine shutdown to permit rapid access, setup, trouble shooting, rectifying faults, and a quick return to original function. There are a number of versions of these trade standard approved safe-stop mechanisms, such as safely limited speed, zero speed, safe brake control, safe disable, or motor power cut-off.
Such systems may need self monitoring, redundancy, faults that fail safe, and warnings if a person enters a monitored zone. In addition, such systems are often applied in situations of historical high risk and abuse, thus requiring special efforts at design safety. An old design adage is to keep it simple! However, this does not justify inadequate responses to realistic challenges.
Equipment protective features or devices, such as the use of fault tolerant or redundant components, may serve to cover or compensate for areas of assumed unreliability or uncertainty in an assembly or system. If a failure occurs in the area covered, normal machine operation continues. But, the human operator may remain unaware of the problem until the second failure occurs and stops or alters the machine function.
It is important to determine whether any harm would occur from a stoppage or malfunction, be the harm to process, hardware, software, environment, mission, function, or person. If there could be unacceptable harm, the operator should receive a unique error signal, instruction, caution, or warning after the first failure, so that corrective action could be taken in time to avoid a harmful consequence. The general approach is preventive, not corrective.
If there is complete redundancy or the use of tandem units, each functional pathway should be independent and have an indicator of single pathway failure. Otherwise, a failure might remain hidden or undetected because the core function or capability could continue and the failure remains unabated and unnoticed. In essence, all system design vulnerabilities that could cause unacceptable harm should not have a failure mode that occurs without appropriately alerting someone to take timely appropriate action.
Software has become ever more important in mechanical systems, as it has elsewhere in the world around us. Design engineers should be fully aware that far too many software programs exhibit unexpected bugs, lockups, memory errors, out-of-bound errors, unhandled exceptions, defects, race conditions, inadvertent function inhibits, and difficulties in fault identification, and can produce excessive test errors or failures.
Software assurance efforts may be lengthy, so they should begin early in the development process. Static analysis and code running should occur well before code hardening. The number of possible software errors is generally underestimated and there is excessive reliance on late manual code reviews, so difficult-to-find-and-solve design safety problems may emerge much later. The logical or functional flaws in the design architecture may not become apparent until years after marketing, during customer operation. So extended field-testing, not just bench testing, may be part of a realistic verification process to discover design safety problems.
As the Rules Change
Since regulations and standards may now change rapidly and significantly, design engineers should become more attentive to periodicals and the Internet that describe near-future proposed changes. Determining their possible design and manufacturing effects may permit early resolution of possible consequential problems.
For example, lead-based alloys used for solder and plating were recently restricted by the European Union’s RoHS (Restriction of Hazardous Substances) directive. This affected manufacturers in the U.S. and other non-E.U. countries because the products they would sell in Europe have to comply with the directive. They had to engineer around new issues of connector reliability and performance, and manufacturing processes. Lead-free tin, silver, and copper mixes may be less flexible and more susceptible to humidity, require plastic and parts that are more heat resistant, have different elastomeric component performance, and manifest new failure modes.
The directive also restricts the use of mercury, cadmium, hexavalent chromium, polybrominated biphyenyl, and polybrominated diphenyl ether. Prospective design safety issues have not been as quickly resolved as desired in such situations.
Similarly, the European Union’s REACH (Regulation, Evaluation, Authorization, and Restriction of Chemical Substances) program is resulting in a new global system for the classification, labeling, and packaging of chemicals and products containing chemicals. The REACH program, located at the European Chemicals Agency in Helsinki, Finland, has the aim of improving the protection of human health and the environment.
Related programs are under way by the United Nations and by various countries including the United States. and there are plans for certification programs to control hazardous substances and proposals for pictograms deliver non-verbal, language-independent warning messages. There may be Substance Information Exchange Forums under U.N. guidelines, but toxicity data has historically been slow to accumulate and it provokes considerable controversy. These are all attempts to provide harmonized systems to enable shipment of uniform products anywhere in the world, aided by regulatory elimination of barriers to trade.
There are also design safety implications in the forthcoming legislation and government directives as various countries and geopolitical regional blocs attempt to influence international trade. Again, advance awareness is necessary to understand and to prepare necessary product changes on a timely basis. In such near-future directed changes, it is apparent that design safety is a key variable consistent with product simplification, function, and cost.
Communicating to All Parties
Safety communications that target various individuals should have distinct objectives and such messages have become fairly sophisticated. The basic objective of user warnings is to provide hazard identification and associated risk information in a form that could provide them with a reasonably fair opportunity to avoid personal harm. The basic objective of owner information is to provide knowledge that could reduce avoidable damage to property, machine functionality, and the attainment of business goals. The basic objective of customer information is to provide pre-purchase information suitable for risk-benefit or risk-assessment purposes and as an implied personal consent to specified risk.
In some cases, even bystanders may benefit from tailored communication. As some international standards, including European Standard EN 60601-1 on medical electrical equipment, indicate, labeling is now considered a critical component of a device or system. Labeling helps to mitigate residual design risks. Reducing such risks reduces the need for warnings and risk communications.
Understandably, warnings are a last resort after diligent efforts to achieve an appropriate level of design safety. They must not be excessively burdensome, inappropriate, ineffective, or vague. The cost and feasibility of compliance should produce obvious benefits that outweigh any burdens. An example of inappropriateness may be too many alarms, which produce human cognitive dissonance or overload, particularly during upsets and distractions, so they may be ignored as repeats or false alarms, or may be mentally suppressed as inconsequential events.
There are safety concepts that can provoke considerable argument and may result in important influences or bias in design safety decision-making. The precautionary principle has been adopted in some countries. Its premise is that a danger may exist whenever there are reasonable grounds for concern over potentially dangerous effects, despite inconclusive or uncertain evidence. In the U.S., the Supreme Court has decided that there must be reliable scientific evidence to prove a harm.
The controversy has been interpreted as a conflict between a potential and actual threat, essentially a premarket vs. postmarket action incentive, or unneeded spending vs. a lean philosophy.
Costs and Benefits
Similarly, there are meaningful conflicts about cost-benefit analysis. This form of risk assessment has its proponents and opponents. Although the estimations or conclusions sound precise and verifiable, fudge factors and contingency adjustments serve to provide answers as optimistic or pessimistic and biased as those who prepared the analysis.
Another example of possible bias in design safety that could have significant adverse consequences is the previously mentioned REACH program. Since 1976, in the United States under the Toxic Substances Control Act (TSCA), only about 200 official requests have been issued to obtain toxicity risk data, out of the thousands of commercial chemicals on the market. The Environmental Protection Agency does have a voluntary effort to generate and make public the toxicity data on 2,500 organic compounds, but has not collected all the data needed and the quality of the data being collected has been criticized.
Congressional hearings on possible revisions to the act have produced testimony that changes are needed so the EPA can deem chemical products safe or unsafe, and there is testimony that the U.S. should not adopt a system like REACH, which is seen as too complex and demanding. The lack of firm safety data, official toxicity classifications, and the conflict of opinion does impose an unfair speculative burden on the design engineer who lacks proper guidance on the safety of chemicals used in the fabrication and use of various products.
In such situations of risk uncertainty, the design engineer may attempt to shift responsibility to some authorized supervisory authority, a design review board, a toxicological consult, or a source of legal opinion. It is an engineering management decision as to the risks to be undertaken, including significant uncertainties, and how to manage, control, or offset the risk consequences. Unfortunately, the design engineer may not understand every bias (choice) of his management or the need to fully identify the design risks and make appropriate tailored decisions.
There are other design safety strategies that could bias a detailed analysis that otherwise may be rigorously performed. What effect should be given to safety certification of components and assemblies, such as the UL, CSA, VDE, and TUV certification to various standards and requirements? For long service life products, what kind of focus and service responsibilities should be cautiously assumed for end-of-life, disposal, and recycling when various political and legal jurisdictions can impose new product safety requirements at any future time? What kind of exculpatory documentation should be retained of safety analysis and reports when they may, in the future, revert to an incriminating and biased status? Where operational teamwork is required on a system, how can operator interactions and cultural analyses be performed to reveal design data without privacy, research method, bias, or other concerns?
All the potential tasks and applications described in this article suggest trends in which design safety has emerged from easily delegated tasks to a more specialized and professional endeavor. Even limited domestic brands may find that somehow their products have entered the marketplace of other countries with differing applicable standards and a severe penalty system. It is now time to consider whether changes may or may not be advisable, for the proactive design of various products and systems, in terms of the worldwide trends in design safety. The benefits and burdens of design safety changes, at various stages of the design process, have become a critical engineering management issue.
There has been a worldwide reappraisal of design safety given recent adverse experiences and the pressures originating from various government agency regulations. This has been compounded by increasing international industrial competition, rapid technological advances, and heightened societal expectations.
There are increasing demands for more objective, precise, and in-depth identification of possible hazards, assessment of their associated risks, adherence to practical and realistic risk acceptability limits, and the utilization of more effective remedies and countermeasures.
The role of the engineer, in terms of design safety, has broadened considerably. It now includes responsibility for updated design techniques and some upgraded knowledge of software, multi-language communication, standards and regulations, human error control, and environmental effects.
There are creative opportunities in design safety assurance in terms of alternative designs, advanced safety concepts, material selection, verification, validation, recall, fabrication, field testing, experience retention and utilization, customer expectations, and the reduction of residual risk and uncertainty. This should result in the ever better products and services that are needed for leading-edge marketplace success
For Further Reading
1. ASME Safety Division. An Instructional Aid for Occupational Safety and Health in Mechanical Engineering Design. American Society of Mechanical Engineers, New York, 1984.
2. Peters, George A., and Barbara J. Peters, Human Error: Causes and Control. CRC Press/Taylor and Francis, London, 2006 (Spanish Translation, Grupo Editorial Patria, Mexico, 2007).
3. Peters, George A. “Product Liability and Safety,” pp. 20-11 to 20-16, in The CRC Handbook of Mechanical Engineering, 2nd ed. (Frank Kreith and D. Yogi Goswami, eds.), CRC Press, New York, 2005.
4. Peters, George A., and Barbara J. Peters, Medical Error and Patient Safety, Boca Raton: CRC Press, 2008.
5. Peters, George A., and Barbara J. Peters, Automotive Vehicle Safety, Taylor and Francis, London, 2002.
6. Ericson, Clifton, Hazard Analysis Techniques for System Safety, Hoboken, N.J.: Wiley, 2005.
7. European Standard EN 60601-1: 1990 Medical Electrical Equipment, Part 1, General Requirements for Safety, CENELEC, European Committee for Electrotechnical Standardization, Brussels, Belgium, August 1990.
8. European Standard EN 1441: 1998 Medical Devices—Risk Analysis, CEN, European Committee for Standardization, Brussels, Belgium, October 1997.
9. International Standard EN ISO 14971: 2000 Medical Devices—Application of Risk Management to Medical Devices, CEN, European Committee for Standardization, Brussels, Belgium, and Geneva, Switzerland, December 2000.
10. Peters, George A., and Barbara J. Peters, Warnings, Instructions and Technical Communications, Tucson, Ariz., L & J Co. Publishing Co., 1999.
George A. Peters and Barbara J. Peters are consultants in the law firm Peters & Peters based in Santa Monica, Calif. They are the authors of design safety oriented books such as Human Error, Medical Error, Automotive Vehicle Safety, and Warnings and Technical Communications.