Network Diagnostics or Spam — How Do You Get Rid of All the Noise?

How Does One Decide Which Diagnostics are Useful, Actionable, and Which Can Be Suppressed, Forgotten Forever?

Aug. 9, 2012

4 min read

About the Author

John Rezabek is a process control specialist for ISP Corp. in Lima, Ohio.You've heard of the "Internet of Things," right? That's where your dishwasher and your dog's collar, your Volkswagen and your sprinkler system all have microchips in them, and they all have apps on your smartphone. Maybe the dog's collar even integrates with the sprinkler so it shoos him off the flower beds. Wow, that's really handy.

But along with all that network capability, engineers and code writers have also provided a "wealth" of diagnostics — some of which might not be all that useful.

The grandma of questionable diagnostics is possibly the ubiquitous "Check Engine" lamp, like the one on my car's instrument cluster. I think it has to do with some emissions accoutrements — an EGR valve, perhaps. But even if you dutifully changed it at 70,000 miles, you still have to take three-fourths of the car apart to turn the lamp off. So there it glows.

If you define spam as useless messages that you didn't ask for, you might begin viewing the increasing quantity of useless, redundant and "stuck" diagnostic messages in the same light. These messages and indications are spam that you can't turn off or run through a filter. In many cases, they can mask other problems that you wish you knew about.

My plant's distributed control system (DCS) has a wealth of in-depth diagnostics, but they all roll up to a single "controller bad" indication. You can drill down to find out what the specific issue is — that's good. But if it's something you can't immediately fix, the thousands of other potential problems might fail to invoke a "re-annunciation" of the "bad" status. Supposedly, there's a mechanism to table a stuck diagnostic, but I haven't found the right-click context menu that reveals how one might do this. So I stare at the spam for weeks or months or more, and any additional indications — including the ones I might care about — could go unnoticed.

The DCS is just the beginning. I have a certain flowmeter that reads near zero most of the time. On a regular basis, though, it somehow detects and indicates reverse flow. We don't know any reason why it should have reverse flow, but should we suppress it?

There are hundreds and hundreds more instruments and valve positioners, all of which have some variety of preconfigured diagnostics. We get spammed by them as well. Every now and then, there's a message that is genuinely actionable, so your thoughts of possibly suppressing the alerts — where possible — are squelched.

How do you get rid of all the noise? Should I suppress "EEPROM Read Failure?" How does one decide which diagnostics are useful and actionable, and which can be suppressed and forgotten forever?

For the past decade or more, we've had futurists among us who envision a brave new world where "intelligent" devices determine their ailments and take actions like sending emails or writing work orders. Why not? Why make a human read the messages and hunt-and-peck their way through the manufacturing execution system (MES)? Just generate the work order automatically, mate. That's right, we can automatically fill up Maximo or SAP with spam.

In reality, the need for the application of a thinking human brain couldn't be greater than it is now. The immensely powerful machine on your desk, which is far more powerful than the Ferrari of smart devices, has had armies of hardware and software geniuses craft ever-more sophisticated diagnostics for it. Does yours ever "phone home"? My email client crashes and sends diagnostics back to the mother ship about five times a day. Thank goodness this monster isn't writing job orders every time it has a hiccup.

The standards set forth by ISA and other organizations for alarm rationalization need to be applied in the universe of device diagnostics as well. Diagnostics should be specific, actionable and not redundant. And if it's spam, noise or chatter, we need a consistent method to filter, suppress or table it. No one should have to disassemble their dashboard, or click through seven layers of slow-loading dialog boxes to tell an alarm to take a break for a fortnight, or forever.

Just as the DCS is the compendium for all process alarms, and is armed with the right tools for managing them, so should our asset management systems be the clearing house for diagnostics, and provide us the tools to make them meaningful.

No more spam, please.