INQ4_redundancy

Redundancy Is Redundant, Again

Nov. 17, 2011
Making the Most Reliable Component(s) Redundant has Little Impact on Overall Availability
By John Rezabek, Contributing Editor

Several decades into the microprocessor era of automation, the availability of redundancy in logic solvers, controllers, I/O and network hardware has become fairly commonplace. Often it isn't that much of a premium to achieve a level of redundancy, as many companies provide redundant ring solutions for Ethernet.

So should we just apply redundancy carte blanche? Aren't two (or three) always better than one? Why not opt for better availability, especially if the cost isn't prohibitive?

There's a lesson we learned in redundancy that I frequently reflect upon. Thirty years ago, our company had very successfully applied Modicon 584 PLCs in numerous pumping stations along the Trans-Alaska Pipeline. Our controls group leadership was very pleased with them because they represented our first large-scale application of ladder logic solved by microprocessors. The PLCs—we actually called them PCs at the time—had proven themselves to be very reliable, even though they had no processor, network or I/O redundancy. So when interlocks and dryer-sequencing logic were required for a new petrochemical complex, the choice to use the 584 was considered a no-brainer, even more so when Modicon introduced the J-211 redundancy supervisor. The rugged and reliable 584 could only be better if it were fitted with a hot-backup redundant logic solver, right? We were so confident that we brought virtually every snippet of logic in the complex into the PLC to be solved. The mighty, fast (for their day) and redundant PLCs couldn't possibly be less reliable than a mechanical relay in a panel, could they?

[pullquote]I think Modicon was aiming to make a foray into the continuous process industry, where one can't simply halt the line briefly to replace a fuse or swap a card, or wait for a shift change—the process was expected to run continuously for months and years between planned outages.

But the early J-211 solution created complexity and confusion where there had been clarity. We learned the hard way through numerous spurious and plant-wide trips that the J-211 and the architecture it required had created additional single points of failure.

We were novices and this was the first system for our integrator as well, and we likely would have been better off with a simplex logic solver, or even old clunky relays distributed in local panels. It was exciting to use our new hammer to hit every nail, but in doing so we actually decreased the plant availability, increasing its vulnerability to spurious and usually self-inflicted trips.

The essential learning was this: Simplicity and distributed control almost always trump complexity. The DCSs that are loved by the process industry cost millions because a lot of brilliant engineers spent their careers making them bullet-proof and user-proof, and have often matured through lengthy and ugly trials-by-fire in pioneering end users' plants.

The other important lesson took me a few decades longer to learn: Making the most reliable component(s) redundant has little impact on overall availability. It wasn't until the Safety-Instrumented-System (SIS) gurus made us look at their pie charts, which aimed to show the contribution of various system components to reliability and spurious trip rate. The logic solver and all its I/O was the tiniest sliver of the pie. Granted, it was already highly redundant, but improving things like network reliability would never compensate for the much less reliable sensors and valves. For redundancy to have a measurable impact on availability, it has to be applied to the least reliable components.

So we had a discussion the other day about network redundancy. "If you're going to make a network redundant, always employ geographically separate paths" was the advice of our instructor (and fellow IN contributing editor), Ian Verhappen of SAIT. Cables on the same route would both fall victim to the backhoe or the cherry picker, and the redundancy would have bought you nothing.

But in the days of point-to-point, were we ever compelled to make a 36-pair cable redundant? Is a single twisted pair or fiber carrying fieldbus or Ethernet any more vulnerable to the vagaries of backhoes and cherry pickers than the brute-force copper network we've traditionally applied?

Having been guilty of reflexive application of redundancy, I'd encourage users to evaluate causes, their likelihood, and the severity of the consequences, to better gauge its value. Consider using this analysis—not emotion—to guide your choices.