Avoid drowning in data with data analytics software

About the author

Dan Hebert is a contributing editor for Control and Control Design.

If your company builds machine or process skids, you’ve probably heard about machine learning as it’s a heavily hyped term, right up there with the now legendary, and bordering on mythical, Internet of Things.

And with a moniker like machine learning, who wouldn’t be interested? Everyone wants their machines to be smarter, and if a machine can learn on its own and somehow become better, wouldn’t that be great? Of course it would, but reality often falls short of the hype.

Machine learning takes all of the data you’ve collected from your machine or process skid, or from a whole group of these items of equipment in a plant, and automatically forges relationships among the data. Automatic learning sounds great, but what might this mean to you in practice?

Let’s say your injection molding machine collects and stores 100 data points a second. You store all of this information in an historian database, and it quickly grows to monstrous proportions as 100 data points per second equates to 360,000 data points per hour. So, you set your machine learning software loose on the data, and within minutes it begins to kick out correlations among the data.

For example, it will tell you that, when the power to the machine is off, the machine is not producing product, and correlation between these two variables is very strong. It will tell you that, when the injection molding heater is on, the temperature inside the molding area of the machine is always above 200 °F.

The machine learning software will automatically generate tens of thousands of these strong data correlations, most of which will be completely useless to you. You were drowning in data before, and now your machine learning software has thrown you the anchor of way too many strong data correlations, leaving it to you to find out which ones contain useful information.

Of course, machine learning software can be directed, so you can tell the software to only return certain data correlations—let’s say, those related to machine output in parts per minute, which your machine is measuring continuously. So, you might tell the software to show you which variables are correlated to a 20% or more decrease in machine output.

The software will now tell you that, when power to the machine is off, the output drops by more than 20%. It will tell you, when the injection molding temperature is less than 200 °F, output is down sharply. Instead of tens of thousands of mostly worthless correlations, you now have thousands. A smaller anchor to be sure, but still little or no solace to someone drowning in data and charged by management to make sense of it all and improve machine, process skid and/or plant operation.

So, if machine learning isn’t the answer for those who are data rich but information poor, then what is (Figure 1)?

For richer or for poorer

Figure 1: Analytics software can address the data-rich, information-poor issue by providing a tool designed for use by control system and process engineers.

Source: Seeq

Self-directed data analytics

Data analytics software incorporates machine learning technologies to accelerate the efforts of the user, but takes things one critical step further. The main difference is that data analytics software is always used in a directed fashion by the engineer or expert seeking specific answers or insights to a question, instead of just calculating relationships without context or focus.

“Seeq is an application specifically designed to provide faster and richer insights into the time-series data stored in historians,” says Michael Risse, vice-president of Seeq. Like other data analytics software, Seeq works with the data many machine and process skid builders already have stored in a data historian.

“To install Seeq, the machine or skid builder could put it on the same PC housing the historian data or on another PC, in either case using the software’s data connector to automatically link to the historian. Once the data connection is made, an engineer can use the intuitive, Web-based interface to investigate performance attributes or create reports,” explains Risse.

Without a tool like Seeq, the alternative is usually a spreadsheet. Data is exported from the historian to the spreadsheet software, a task which must be performed for each data set of interest. Once the data is in the spreadsheet, it takes someone very well-versed in spreadsheet manipulation and programming to analyze the data and generate reports. This difficult set of tasks causes many to give up because the effort required isn’t worth the potential return.

But, with data analytics software, improvements can be realized in a few hours rather than days or weeks, as the tool is specifically designed for the particular task of analyzing time-series data, as opposed to a general purpose tool like a spreadsheet. The software is also designed for use by engineers, with no programming or IT specialists required.

“In the past 24 months, we have seen two trends in the media,” relates Hans De Leenheer, the vice-president of marketing at TrendMiner. “First, we need to hire more data scientists, and, second, we need them to talk to the process engineer. Only the second part is true, as our software doesn’t require a data scientist. The only reason why data analytics in industry has not evolved as in some other sectors is because the data only makes sense when it is combined with the experience of the engineer or expert; otherwise, they are just numbers.”

Also read: Cognitive computing in the cloud is smarter than you think

The most critical attribute of any machine or process skid is usually availability or uptime, so insights into health and future operating state are paramount. Data analytics software has advanced trending, search and batch analysis tools, so comparisons among product or process runs—or checking variability in performance over time, batches or production runs—becomes a matter of minutes rather than hours.

Optimization is another priority, such as tweaking processes to gain a percent or two on margin, yield or quality attributes. “Our customers say they know what they want to do, but it’s just too hard or takes too long using the same approaches they have used for 20 years: Excel, programming scripts and elbow grease. Seeq changes the investigation paradigm by bringing analytics and insight into an advanced trend viewing application, so visualization, data manipulation, data cleansing and search capabilities are an integral and intuitive part of the user experience,” continues Risse.

The activities described by Risse can improve the operation of individual machines and process skids, but what about the much more complex and often more important task of optimizing an entire plant?

“Our software has the ability to aggregate data from one asset in the context of other data sets or other data sources to investigate issues across a plant. So, the process the asset is a part of can be improved as a whole by integrating machine, process skid and other data sets to enable analytics across the entire production line,” concludes Risse.