Michael Wagner, CEO and co-founder of Edge Case Research, will be giving a presentation on the challenges in testing and validation of collaborative robotic technologies, as part of the Collaborative Robots: Safety Implications session on April 4 from 3 pm to 5 pm at Automate 2017 in Chicago. Wagner has almost two decades of experience developing advanced robotic systems for industry, the Department of Defense and NASA. He was the project manager of the Automated Stress Testing for Autonomy Architectures (ASTAA) project, and he served as the system-safety lead on the Autonomous Platform Demonstrator (APD) vehicle, both at CarnegieMellonUniversity. In 2000, Wagner was awarded the Antarctica Service Medal, issued by the U.S. Department of Defense, for his field work demonstrating a robot able to perform autonomous geology. Contact him at [email protected].
The term “collaborative robotics” describes the flexible, yet safe, arrangement of robots and humans together in a workspace to perform various tasks. Collaborative robotics contrasts with traditional work cells, which strictly isolate robots from people. Such strict isolation makes some tasks slow or impossible to perform. ISO’s TS 15066 technical specification, along with the ISO 10218 standard, defines various acceptable methods for building a safe collaborative robotic application.
Implementing the method that TS 15066 refers to as “speed and separation monitoring” creates some potential challenges. In such a setup, robots and humans can move concurrently in the working area, as long as the robot maintains a protective separation distance from all humans. One can implement speed and separation monitoring conservatively, for example, with zone monitoring and still enable substantial productivity gains. However, the natural conclusion of research into speed and separation monitoring technology is an extremely sophisticated, yet safe, system that accurately senses the location, speed and posture of all personnel in a workspace and then precisely controls robot motion to maintain the smallest possible protective separation distance. Such technology will maximize productivity while opening up new kinds of tasks that are impossible with discretized safety zones, or with force-limited “inherently safe” robots. Preliminary progress has been shown by several efforts, including the Hybrid Safety System at CarnegieMellonUniversity.
Capabilities required by such systems include perception and modeling algorithms, along with high-dimensional motion planning. Achieving these capabilities is the focus of a great deal of robotics research over the past few decades. Now we argue our next challenge is to learn how to build these capabilities safely.
Motion planning is the process of designing a sequence of discrete robot movements that achieve a desired objective, such as transferring a work piece while safely moving around personnel. Planning motion is essentially a search task. Planning the motions of a dexterous robot through complex environments is an extremely high-dimensional search task. Because of this, motion-planning algorithms may be unable to guarantee finding the single optimal solution; instead they use non-deterministic, Monte Carlo sampling to find sufficing plans.
The non-determinism inherent to motion planning has implications for testing and validation. While techniques such as using a reproducible pseudo-random number stream in unit test can be helpful, it may be impractical to create completely deterministic behavior in an integrated system, especially if small changes in initial conditions lead to diverging system behaviors. This means that every system-level test could potentially result in a different outcome, despite attempts to exercise nominally identical test cases and also that specific edge cases can be difficult to investigate.
Probabilistic system behaviors present a similar challenge to validation, because passing a test once does not mean that the test will be passed every time. In fact, with a probabilistic behavior it might be expected that at least some types of tests will fail some fraction of the time. Therefore, testing might not be oriented toward determining if behaviors are correct, but rather to validating that the statistical characteristics of the behavior are accurately specified—that the false-negative detection rate is no greater than the rate assumed in an accompanying safety argument. This is likely to take a great many more tests than simple functional validation, especially if the behavior in question is safety-critical and expected to have an extremely low failure rate.
Perception and modeling algorithms are critical for sensing people and robots within a collaborative space. Many perception algorithms perform classification—determining whether objects of particular classes are present in a scene. One difficulty, from a safety perspective, is that we do not know a priori how to fully define these classes. What, exactly, tells us if a person is in an image? Lacking these definitions, today’s most powerful perception algorithms use machine learning to figure out how to discriminate classes by using huge numbers of example images. A big advance over the past five or so years has been the rise of what is called “deep learning.” The distinction of deep learning from, perhaps, shallow learning lies in the concept of features. A feature could be thought of as a criterion that is partially used to discern objects of various classes. This abstract idea is important for safety analysts to grasp, since the use of defined features gives a traditional safety-critical development effort at least something approaching traditional design artifacts. Deep learning algorithms now make up the most effective computer-vision techniques published in the academic literature.
To illustrate the concept of deep learning, consider a visual system for recognizing people in images. Such systems are in use in automotive driver-warning systems today. Tomorrow, more powerful versions of these systems will be needed for collaborative robotics to squeeze out performance, working difficult tasks that involve accurately locating the positions of hands and fingers, for instance, near a work piece. We might imagine using a statistical version of something like a stick-figure model of a human body. Training examples would provide us with statistical distributions of features such as body shape, arrangement of limbs on a body and postures. By comparison, deep learning needs no such definitions of underlying features. With massive numbers of training images, the deep learner on its own defines a useful feature space and then how to differentiate classes within this space. Missing from this picture is how engineers can place or verify requirements on such a system. While one can imagine describing the desired behavior of the overall system—for example, prevent risks involving hands being pinched by the machinery—unlike the traditional, safety-critical “V” process model, we cannot decompose these objectives into lower-level requirements inside the murky convolutional neural network (CNN).
Architectural tactics can be used to ameliorate this problem, especially in the short term. Ultimately, however, the economic forces driving adoption of sophisticated collaborative work cells will demand strong arguments about the correctness and safety of deep-learning systems themselves.
We believe that several changes in how testing and validation are performed today are required to achieve this. First, we need standards by which we carefully evaluate the completeness and provenance of data used for training machine-learning systems. These data represent one form of requirements on such a system, so validation is essential. Furthermore, we would like so-called “adversarial” techniques that aggressively try to punch holes in a system, even those previously deemed to be built correctly. Automated software robustness testing is one such technique, which we have used for 20 years in a variety of applications. It has proven effective on collaborative-robotics technologies such as motion planning and perception. Finally, new analysis techniques are called for that can investigate the degree to which deep learners have fully grasped key concepts on which safety depends.