Process variables and the art of calibrating instruments

Designing and validating procedures for device calibration.

By Jim McCarty, Optimation

5 of 5 1 | 2 | 3 | 4 | 5 > View on one page

Now plot the sensitivities vs. calibration number for all sensors, or multiple plots with subsets, if that’s easier to visualize. Do the sensitivities of any one sensor jump around more than the others? If you calibrated enough sensors, you may have a dud or two in there to do this with. What’s different about the raw data (pressure vs. current) curves? Is there anything way off in the sensitivities, offsets or R² values? If it looks not so linear, if you fit a quadratic or cubic to all calibration curves, how do the x³ and x² coefficients of the suspect sensors compare to the others? They should be higher, and the good sensors should have values close to zero for both of these coefficients. If so, this is something you can use to flag a potentially bad calibration and/or sensor.

Are your sensors being routinely calibrated? Are your skids kept in-house and you calibrate/check the instrumentation on them every so many days, weeks or months? If so, this takes a lot of guesswork out of the analysis. Running several or more sensors through a calibration and looking at their sensitivities and offsets will give you an idea of their expected normal values of that sensor type, but, in tracking historical data, you can expect a far smaller variance in those values. In the previous pressure transducer example, you may see 90% of all pressure transducers you calibrate will have sensitivities at some mean +/-0.8 psig/mA, but 90% of all transducers never left a range of +/-0.01 psig/mA over the course of so many years. Hence, if you detect a sharp deviation from a sensor’s own historical data, you can flag it for review, strictly by the numbers alone. Furthermore, although an acceptable transducer may have a sensitivity that falls within, say, 5% of the nominal sensitivity, if one specific transducer jumps from the upper end to the lower end of that range, or vice versa, then it should be flagged for review.

A good starting point for these numbers is their repeatability or hysteresis ratings (%) on their data sheets, as that can be interpreted as there being something wrong with a sensor that is outside of these tolerances.

An individual sensor’s data can be used to establish its own unique limits, which are more restrictive than the specified accuracy tolerance of the device (Figure 7). Notice how the sensitivity of Transducer 2 is outside of its own hysteresis limits—not “acting like its usual self”—but well within the range of expected values for transducers. This is certainly not definitive proof that the transducer is faulty, but it’s strong evidence suggesting that it could be, so it should be flagged for review.

Establishing limits on each individual data point sets a considerably higher standard for a passing calibration, as just a single point out of potentially hundreds can cause a failed calibration. However, if it’s caused by something that’s not indicative of a sensor issue, such as a loose electrical connection between the DUC and DAQ, dumb luck, natural randomness or a fluke, then repeating the calibration procedure and producing a passing result should be no problem. The faulty sensors are the ones likely to repeatedly fail a calibration, so I highly recommend this approach with a three-strikes policy: three consecutive failed calibration attempts flags a DUC for removal.

I almost can’t stress data visualization enough, especially in the infancy of this process. Numbers can be misleading. In Figure 8, the measured pressure oscillates in a range of 10 psig—a range that, relative to most sensor performance, is wide enough to land a 747—but its R² value, 0.9969, may appear to be very close to 1, and give the impression that the data is very good. Visualizing the data will yield meaningful interpretations of the numbers: Do you know what you’d call a bad R² value? What if all sensors are performing within spec, but one has a sharp relative deviation, still within the hysteresis limits? The deviation from “the usual” likely wouldn’t have stuck out in a table, as it would still be less than the spec limit, but would be more likely to stick out in a graph. Visualizing data will help you to establish a picture of what “the usual” should look like, so that you can be mindful of deviations.

If you investigate a deviation that just doesn’t look quite right and then find that the sensor is malfunctioning, you may later ask: Why didn’t the calibration R²/sensitivity/offset/raw data fail some sort of limit comparison? You may determine that the limit was set too liberally, allowing bad sensor calibrations to pass. Conversely, if you are seemingly flagging sensors for review left and right, your limits may be too tight. Being able to verify that its raw calibration curves and historically calculated sensitivities and offsets appear to be normal, and there is nothing abnormal about results from parts tested with that sensor, will give indications that the limits for the parameter that sensors are being flagged for are too tight.

Natural randomness is expected in calibration data and calculated sensitivities and offsets, but all expected to be a relatively small range of variation, meaning that a considerable variation in any of these predicted behaviors is indicative of a potentially faulty sensor. Conversely, a sensor calibration that is within all limits set for the discussed metrics has a very low probability that either the calibration was performed incorrectly or that the sensor is malfunctioning (your escape probability is very low). Visualizing historical calibration data, especially in the infancy of this calibration process, is highly valuable in determining optimum statistical-process-control (SPC) limits in development and serves as an aid for troubleshooting for the life of the test systems.


These aren’t the only requirements of a good calibration, but in my line of work I’ve seen calibration processes done many different ways at many different places, and this is my recollection of what I’ve seen work out best, as well as areas to look into when it comes to designing and validating calibration procedures for your DUCs and reference units.

Building automated of semi-automated calibration and SPC capabilities into software is well worth the repeatability and reproducibility it buys you, and it is by far the best way to enforce the traceability required in a medical device environment. Development costs can be justified by comparing them to the cost of extra labor spent troubleshooting test systems or, depending on the application, escapes and subsequent product recalls. Both the calibration and SPC software capabilities can be built in such a way that doesn’t require re-validation of existing code.


Homepage image courtesy of renjith krishnan at

5 of 5 1 | 2 | 3 | 4 | 5 > View on one page
Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.


No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments