Vibration, voltage and VLANs: How to track factory network faults
Key Highlights
- Identify unique physical layer threats in industrial environments, such as vibration, moisture and EMF, that lead to intermittent network failures.
- Utilize essential command-line tools like ping to check for noise and pathping to pinpoint specific nodes responsible for packet loss.
- Maintain static IP documentation and ensure the troubleshooting laptop is configured to the correct subnet before diagnosing connection issues.
Troubleshooting industrial Ethernet networks can be a daunting task. Most often, you do not have the cable-routing diagrams and the switches/routers that devices are connected to. Rarely does the OT/plant floor Ethernet network have its own segmentation, which it should be by the way.
When I created the maintenance network at a 1.7 million sq ft distribution center, we had our own switch system creating our own network and cabling layouts. However, we needed to have some of the devices on the IT network, as well, which was handled by connecting those devices to the IT switches and routers. Virtual local area networks (VLANs) were set up to allow communication between the two networks.
When a problem occurred, however, it was important to be able to ascertain where the possible issue was and within which network. We were running a wireless network, too, which somewhat complicated the understanding for the maintenance staff.
On the OT network we had 14 PCs running a human-machine-interface (HMI) application that was configured to be able to reach the 28 mobile automated storage and retrieval system (AS/RS) cranes in the building. The 28 programmable logic controllers (PLCs), located on the AS/RS cranes, talked to the OT and IT networks wirelessly. There were three main PLCs on the IT network that controlled the movement of product throughout the building. Helping were multiple Ethernet-enabled PLCs to transition product between buildings, and we had to track it all on a server to be sure the product ended up at the right distribution point.
If we lost data at any point, the product would end up at a place that would take an intervention to get it back on track or a manual extraction. Not desirable.
Imagine if we lost one piece of data, which would cascade throughout the system and cause all other product data to be incorrect.
So, the integrity of the network was paramount. It is a pain point with multiple points of failure. So, what happens when it fails?
How can it fail? A cabling issue, which would normally be a connector problem due to vibration or moisture in the cabinet. A switch node failure, network port jitter, power supply issues and electromagnetic field (EMF) issues may cause an intermittent problem. You hope it is a fixed problem, such as a blown fuse. It was a closed system so a flood of denial of service (DOS) requests should not be present, but a node constantly requesting data is possible.
So, the server needs to communicate with a PLC more than 600 ft away, and where does the cable go? We are missing data. Network timing can be important, so the messaging has to get to Point B in a certain time frame, which was the case here.
Get your subscription to Control Design’s daily newsletter.
There was a UNIX server upwards of a mile away that read and wrote data to move product, just to further complicate the issue. In all fairness, it was pretty reliable.
When it wasn’t, we had pain points.
Troubleshooting tools now come into play. Can the server see the PLC? This is where we have to know IP addresses, which dictates that all devices on an OT network have static IP addresses. An Excel spreadsheet should document this.
The first line of defense is the Internet control message protocol (ICMP) ping command. The ping, followed by the IP address of the target, will respond with an error or a reply with a timestamp. If any packets get lost, we may have a noise issue. If there is no response, we have a target problem. The PLC may in fact be off-line. Also check the switch port activity lights for data transfer.
If the device is present on the network, then a pathping command may be in order. It will trace the pathway from your network location to the target device with a response to indicate if any node on the path is losing packets, creating a lost packet issue.
The node where this is happening needs to be identified and checked. This can be a replacement for the ICMP traceroute command.
If you are working from a laptop, you may not be on the same subnet as the devices you are trying to access. If the PLC IP address is 192.168.10.100, your laptop has to be on the same Class C address, which means it must have an IP address of 192.168.10.x.
Some laptops and computers get their IP addresses from a dynamic host configuration protocol (DHCP) server, which may give you an IP in a different address range. The ipconfig/all command will give you your IP address. If it doesn’t match, then you have to reset your IP using the network configuration tool.
Network latency causes many issues in industrial networks. Having a network diagram with an IP listing is paramount when you have to figure things out. If you don’t have one, create one. You will thank me later.
About the Author

Jeremy Pollard
CET
Jeremy Pollard, CET, has been writing about technology and software issues for many years. Pollard has been involved in control system programming and training for more than 25 years.

