By Steve Perry, Barber Foods
All is quiet at a sprawling and highly automated production facility. Production lines are running smoothly, there have been no unexpected machine shutdowns, measurements are trending close to setpoints, and processes are in control. It's a good day.
At this particular facility, controllers—PLCs in this case—and graphical control stations are distributed everywhere, communicating with each other via an extensive and modern controls network, built using Ethernet, ControlNet and DeviceNet. Production uses this network to remotely monitor and adjust processes. One control room operator can control hundreds of machines located throughout this giant building remotely. Maintenance uses this network to access controllers remotely in order to troubleshoot and diagnose machine and production line issues.
The Ethernet portion of the controls network shares the same commercial-grade network switches and switch-to-switch cables as the business network, but controls-network traffic is isolated virtually from the business-network traffic using VLANs. With this shared network hardware configuration, most of the controller-to-controller communication and most of the controller-to-graphics-station communication is routed through the same Ethernet network switches that service office computers and servers and printers. Only controllers and graphics stations that are in close proximity to each other do not use the shared network hardware. These close neighbors typically use ControlNet to talk directly to each other.
Problems with the shared network hardware would likely affect the controls network. Problems could include hardware failures, such as network switch faults or cable breaks, or configuration errors, such as improper network switch setup. The information-services team is responsible for maintaining this shared network hardware, but they aren't aware of how this hardware forms a critical communication backbone for the controls network.
The information-services team changed the routing configuration on the shared network switches, inadvertently routing all network traffic out through the Internet.
Because of this, data flow through both networks became very slow, which in turn caused graphical-control stations and maintenance-troubleshooting stations to become unresponsive. This also blocked some controller-to-controller communications, which in turn prevented the transmission of automatic interlocking signals between some machines. Upstream machines could not automatically start or stop based on downstream machine status.
The control room operator discovered that he could not use his graphical control stations to change the behavior or settings of any production machines and immediately reported the problem to the electrical maintenance team. The production machines were flying blind, being controlled by their local controllers running the existing programs and settings. Humans could still stop the machines by using local controls—e-stop buttons or switches—but remote control and setting adjustment was not possible.
The electrical maintenance team used communication software tools on two different computers attached to different parts of the network to troubleshoot the problem and discovered that all of the controllers on the network appeared unresponsive. This symptom indicated a major network failure, so the information-services team was contacted. The information-services team restored the original network switch routing configurations, and the behavior of both networks returned to normal after about one hour of flying blind.
If the flying-blind situation would have persisted any longer, then it is highly likely that all production lines would have been manually shut down, using local controls, to stop the runaway situation, leading to extensive and costly downtime. Minimal human presence is required, and very few highly automated facilities like this are staffed, so operation of machinery via local controls is not easy or fast.
The lack of controller-to-controller interlocking signals could have led to some major incidents. For example, if one of the factory effluent pumps had stopped, production machinery would not have automatically stopped, likely leading to effluent overflows and backups. As another example, consider the interlocking with the steam and refrigeration systems. Lack of communication with those supply systems could have led to product quality issues due to improper cooking or improper freezing.
Physically separate the controls network from the business network. Minimize the number of connections between the two networks, and implement tight restrictions on the traffic across those connections. On the new physically separated controls network, ensure that network hardware and software changes are always thoroughly tested and evaluated, preferably in a lab environment, prior to implementation. Also, consider implementing a redundant network design, such as a ring structure for switch-to-switch communications.
Steve Perry is manufacturing engineer at Barber Foods (www.barberfoods.com) in Portland, Maine.