Monitor and Mend Network Health

Design Issues, Operational Upsets and Potential Attacks Require Early Response

By Jim Montague

Industrial Networking 2013 Quarter 1Just as fruits, vegetables, whole grains, lean protein, regular exercise and routine physical checkups can keep your heart and cardiovascular system up and running, there are many ways to maintain the health of your industrial networks.

Unfortunately, just as with our physical bodies, a lack of awareness and initiative means many solutions for network wellness aren't used as often as they should be — and so they're overwhelming when real problems show up. Likewise, many control and automation engineers are so focused on the uptime and performance of their process applications and production lines that they don't pay enough attention to the switches, wires and other components bringing them the critical data for those processes.

"We recently helped with a proprietary Ethernet ring configuration that includes two distributed control systems (DCSs), three PLCs and seven or eight SCADA nodes running the steam plant at the University of Virginia at Charlottesville," says Reid Garst, vice president at Sterling Engineering Solutions, an applications engineering and distribution firm in Salem, Va. "This network already had Ethernet switches for about 10 years, but more PLCs and other equipment from multiple vendors were added since then, and these devices were prone to flooding the network with multicast traffic. Unfortunately, multicast has sort of a shotgun approach that tries to put 10 lb of data in a 5 lb bag. However, because the existing Ethernet switches were not fully managed, they couldn't handle all the traffic, and the network was overwhelmed and started generating a lot of communication errors."

Consequently, Sterling used managed Ethernet switches from Moxa that could filter multicast communications to help diagnose the steam plant's network, and found that about 80% of the network had been taken over by multicast. "As a result, we preconfigured and put in four of Moxa's EDS 518A managed Ethernet switches in a simple ring topology, and they filtered the multicast communications, decreased traffic back to an acceptable 20%, and reduced the errors and alarms."

Jim Toepper, product marketing manager for Moxa's industrial Ethernet infrastructure division, adds, "Our efforts on network monitoring, maintenance and health are sort of following on what IT was doing about 10 years ago with HP OpenView and SolarWinds software. Now, managed Ethernet switches have become ubiquitous in control and automation, our networks are much bigger, and they need to be monitored regularly and sometimes controlled with network management software (NMS) tweaked for our industries, such as our MXView, Hirschmann's HiVision or Network Vision's IntraVue, which can tell users at a glance if bandwidth utilization is over 60%, and if they should increase network capacity by adding more Layer 3 switches or establishing another virtual local area network (VLAN) segment."

Let's Have a Look 
Likewise, many other tools and methods for network health and optimization are getting easier to use, and more closely integrated with traditional controls equipment and software. For instance, aluminum smelter Qatalum is a joint venture between Qatar Petroleum and Hydro Aluminum of Norway, and its two 1.2 km pot lines, carbon plant, cast house, 1,350 MW power plant, port and storage at its plant in Mesaieed, Qatar, can produce 585,000 tons of aluminum extrusion ingots, foundry alloys and other products per year (Figure 1). Naturally, these applications include thousands of sensors and other components reporting in via hundreds of Ethernet switches and a variety of networks, and so Qatalum recently decided to coordinate all this traffic with Industrial HiVision NMS, including MultiConfig tool for multiple, simultaneous device configuration, from Belden's Hirschmann Automation and Controls division.

"The key elements in our network demand fast, redundant switching, data communications and monitoring, and controls based on fast Ethernet," says Graham Patton, Qatalum's senior network engineer. "Gigabit Ethernet was required in the backbone ring and in the hot-swappable media modules with high port density. These devices were also required to have high predictive hardware lifetimes and lengthy mean time between failures (MTBF). The structure of our network also needed to ensure that no single point of failure can interrupt plant communications, and so all nodes are monitored from a centralized HiVision NMS server. NMS simplifies our daily activities, and makes  resources available for other networking tasks." Other benefits that Qatalum gained from NMS include maximized uptime from its redundant Ethernet ring topology, improved network availability from frequent status updates and threshold-setting functions, auto-topology and discovery functions that show what's connected where in the network, and more efficient plant operations from fault-prediction detection and diagnoses.

"Network management has been a bit of a black hole for many control engineers because they often don't really know where the network is or what's in it," says Mark Cooksley, Hirschmann's product manager for software tools. "Network health requires going back to the beginning, checking the original network design, and making sure it follows recognized design characteristics. You need to ask, 'What's on our network, what applications are using it, what availability do they require, and what do we need to add?' Next, you need to correct redundant links, deploy managed switches where appropriate, and implement NMS to handle switch data. This can include bandwidth utilization, device temperature, power status, link states and errors, and then you need to set thresholds for each. For example, you might want an alert when bandwidth goes above 30%, so you can do some preemptive problem solving. You also need to observe your network over time, and establish baselines for normal performance, which will help identify anomalies when they show up."

Everyone Does Diagnostics
Though network health used to be measured mainly by handheld sniffing devices and IT-based protocols, many control systems and intelligent components have added their own diagnostic capabilities. In fact, many managed Ethernet switches can not only gauge their own status and wellbeing, but they're also able to use Modbus, EtherNet/IP, Profinet and other protocols to report directly to SCADA systems being polled and communicating via a combination of Simple Network Management Protocol (SNMP) and OPC methods.

Likewise, one of the main benefits of today's supervisory NMS is that it provides graphical views and topology maps of software functions and network components in a complete graphical user interface (GUI), which can also be dropped right into users' SCADA displays — making it much more likely that they'll use NMS functions.

The new managed Ethernet switches at the University of Virginia's steam plant also have Modbus registers, which means they can communicate directly with the plant's CitectSCADA system, Garst reports. "Previously, it took a lot of translation for managed Ethernet switches to communicate with a SCADA system, but now users can just poll these registers for the status of the switches, or they can be set to report their own bandwidth, power usage or other parameters, which makes them just like any other devices in the control system," he explains. "This helps control engineers a lot because it puts managed Ethernet switches and other network devices in a language and setting that they can understand and use effectively."

Hirschmann's Cooksley agrees. "This makes our old networking black hole a lot less black. We're even developing a managed switch that can be accessed from a smartphone by using an Apple or Android app, and scanning a QR code on the switch to get its status."

Simplicity, Fieldbuses Help, Too
Logically, just designing and building an industrial network that's less complex can make finding and solving its problems faster and simpler, and one of the classic methods for doing this is reducing wiring and components by using fieldbuses, whether Ethernet-based or not. But even simple networks still need to be checked regularly.

"There are two main things users want to know about their networks," says John Wozniak, automation networking specialist at the CC-Link Partner Assn. (CLPA). "Is a cable disconnected or degraded, and are the data packets running on a network good or bad? Fortunately, when disconnects happen and cycle times and errors increase, they can be easily identified by many tools. Even network architecture design software, such as Mitsubishi's GXWorks and Developer, can show the number of devices on a network, and provide reports from each. In addition, CC-Link IE removes a level of complexity by not using switches, and instead employs CC-Link Network Master to perform device diagnostics and maintain overall network health. However, checking millions of individual data packets requires tools like Wireshark and Frontline, which can help determine what's wrong when packets get lost."

Similarly, Joey Stubbs, PE, North American representative of the EtherCAT Technology Group, adds, "We're talking primarily today about Ethernet-based networks. In particular, EtherCAT has a built-in suite of diagnostics that stands out among industrial Ethernet networks. At any time, EtherCAT users can determine each and every node's operating state, each connection's link status, whether it's accessing its memory correctly, the number of nodes on the network, any CRC errors detected by any nodes and their location, as well as any lost frames in the network. With EtherCAT, the user can capitalize on exact localization of faults and a wider suite of diagnostics features than any other network.

"Also, if the user or machine builder needs more diagnostic capabilities, they can use freely available diagnostic tools, such as Wireshark, which can parse the EtherCAT frames to look at commands and responses from a behavioral perspective in the network. These built-in features make diagnosis of errors or issues with exact localization possible without requiring the integration of separate monitoring systems."

Consulting With Dr. IT
Of course, another crucial element of maintaining industrial network health is working closely with IT professionals, who often have more experience dealing with Ethernet networks and the tools and methods for diagnosing and treating them. Chief among these tools is the free network scanning and data packet evaluation software Wireshark and other similar tools.

Phoenix Contact installs cards for using Wireshark in its components, reports Dan Schaffer, network and security business manager for Phoenix Contact. "We use Wireshark in conjunction with our users to check their bandwidth utilization, and determine if and where they need managed Ethernet switches," he says. "It's also important to check what firmware you're running and check your data stream for errors. Later this year, we're releasing our FL View software that uses SNMP on the back end to do many asset management and network visualization tasks, such as looking for switches and IP addresses and learning what they're connected to."

Similarly, system integrator Automated Control Concepts (ACC) in Neptune, N.J., reports it recently repaired a factory automation system that was experiencing network downtime and some system failures. Some of the original PLCs had been replaced with faster models or added new functions, and the HMI was modified to keep up. However, these modifications weren't coordinated correctly, the control network slowed and often shut down, and staff couldn't find the cause.

ACC investigated and found a 7 s, heartbeat-timer timeout was causing the shutdowns. This reset signal for the timer came from a 1.25 s, square-wave pulse from another PLC when it was cued, and so ACC's engineers decided to check the communication network for the cue using FTS4Control from Frontline Test Equipment. The pulse and timeout were supposed to occur only when a task failed to execute, but ACC's engineers used FTS4Control's filter and frame display to view the network's data at varying granularity, and learned that increased load on the network created a longer token rotation, less samples taken off the pulse output, and a mismatch between the square-wave and network data transfer timing. In short, the pulses were generated normally, but the sampling was infrequent enough to increase the possibility that only low-pulse samplings were transmitted during a given 7 s period. So ACC introduced a PLC heartbeat code with an incrementing number, which removed the dependency on asynchronous timing, and allowed the network and automation system to perform smoothly.

Likewise, thanks to high gold prices, Newmont Mining in Denver says its Asia-Pacific operations are growing quickly, and especially fast in Indonesia, which means the networks supporting its 1,500 production applications, equipment and servers and 5,000 employees and contractors in Sumbawa and Jakarta have high-bandwidth problems. They needed better real-time monitoring, reporting and troubleshooting of multiple sites and applications, so Newmont's IT technicians tested and implemented Orion Network Performance Monitor (NPM), Application Performance Monitor and Engineer's Toolset from SolarWinds.

"Our team needed to adopt a solution that could monitor status of our vendor devices, routers and switches," says Samuel Tirayoh, Newmont's IT manager for Indonesia. "Previous solutions limited us to monitoring support on just a small range of products. Adopting NPM helped resolve our high-bandwidth utilization issues, so now we can track and locate devices across locations, which saves time on solving problems and saves money on replacing broken devices."

Despite the healthful benefits of collaborating with IT, Shawn Gold, marketing manager for the Industrial IT Solutions Service at Honeywell Process Solutions, reports there's still a big skills gap around networking in process control settings. "Many control engineers still think of networking as a black box they can ignore, set and forget, or assume IT will take care of, even if the controls side hasn't allowed IT access to it," he explains. "Some process control users are proactive and work closely with their IT departments, but others don't let IT go into or below a certain demilitarized zone (DMZ) on their network, so IT ignores that area and controls might not know what to do to maintain it."

Gold adds it's more productive for engineers to coordinate with IT, jointly check loading on managed Ethernet switches, upgrade those that need it, and set parameters, thresholds and alerts — just as they do with DCSs. "We've used Wireshark and SolarWinds software to troubleshoot network problems, but we found they can have an adverse effect on DCSs, and so we built our Service Node NMS from third-party tools," Gold says. "Service Node works with our DCS without causing loading problems, and it provides a secure remote connection for network troubleshooting, log analyzing and patching, and next year we're going to add a cybersecurity dashboard."