In Vivo Testing – Programming Systems Laboratory

Software products released into the field typically contain residual defects that either were not detected or could not have been detected during pre-deployment testing. For many large, complex software systems, it is infeasible in terms of time and cost to reliably test all configuration options before release using unit test virtualization, test suite minimization, or any other known approach. For example, Microsoft Internet Explorer has over 19 trillion possible combinations of configuration settings. Even given infinite time and resources to test an application and all its configurations, other software on which a software product depends or with which it interacts (e.g., sensor networks, libraries, virtual machines, etc.) are often updated after the product’s release; it is impossible to test with these dependencies prior to the application’s release, because they did not exist yet. Further, as multi-processor and multi-core systems become more prevalent, multi-threaded applications that had only been tested on single- or dual-processor/core machines are more likely to reveal concurrency bugs.

We are investigating a testing methodology that we call “in vivo” testing, in which tests are continuously executed in the deployment environment. This requires a new type of test case, called in vivo tests, which are designed to run from within the executing application in the states achieved during normal end-user operation rather than in a re-initialized or artificial pre-test state. These tests focus on aspects of the program that should hold true regardless of what state the system is in, but differ from conventional assertion checking, since assertions are prohibited from introducing side-effects: in vivo tests may indeed and typically do have side-effects on the application’s in-memory state, external files, I/O, etc. but these are all “hidden” from users by cloning the executing application to run the test cases in the same kind of sandbox often aimed to address security concerns. The in vivo approach can be used for detecting concurrency, security or robustness issues, as well as conventional flaws that may not have appeared in a testing lab (the “in vitro” environment). Our most recent research concerns how to reduce the overhead of such deployment-time testing as well as automatic generation of some of the in vivo test cases from traditional pre-existing unit tests.

In Fall 2007, we developed a prototype framework called Invite, which is described in our tech report and was presented as a poster at ISSTA 2008 (a variant of this paper was presented at ICST 2009, and is available here). This implementation uses an AspectJ component to instrument selected classes in a Java application, such that each method call in those classes has some chance (configurable on a per-method basis) of executing the method’s corresponding unit test. When a test is run, Invite forks off a new process in which to run the test, and the results are logged.

We also developed a distributed version of Invite, which seeks to amortize the testing load across a community of applications; a paper was published in the student track of ICST 2008. This version currently uses only one global value for the probability of running a test, instead of one per method, however. That value is set by a central server, depending on the size of the “application community”.

In Spring 2008, we looked at various mechanisms for reducing the performance impact of Invite, e.g. by assigning tests to different cores/processors on multi-core/multi-processor machines, or by limiting the number of concurrent tests that may be run. We also looked at ways of balancing testing load across members of a community so that instances under light load pick up more of the testing. Lastly, we created a modified JDK that allows Invite to create copies of files so that in vivo tests do not alter the “real” file system.

In Fall 2008, we ported the Invite framework to C and evaluated more efficient mechanisms for injecting the instrumentation and executing the tests. We also investigated fault localization techniques, which collect data from failed program executions and attempt to discover what caused the failure.

Recently we have investigated ways to make the technique more efficient by only running tests in application states it hasn’t seen before. This cuts down on the number of redundant states that are tested, thus reducing the performance overhead. This work has potential application to domains like model checking and dynamic analysis and was presented in a workshop paper at AST 2010.

Currently we are looking at ways to apply the In Vivo approach to the domain of security testing. Specifically, we devised an approach called Configuration Fuzzing in which the In Vivo tests make slight changes to the application configuration and then check “security invariants” to see if there are any vulnerabilities that are configuration-related. This work was presented at the 2010 Workshop on Secure Software Engineering.

In 2012-2013, we are investigating techniques to efficiently isolate the state of the tests, so as to avoid the effect of the tests on external systems.

Open research questions include:

Can the overhead be reduced by offloading test processes to other machines? This is especially important when the application is running on a single-core machine.
What sorts of defects are most likely to be detected with such an approach? How can we objectively measure the approach’s effectiveness at detecting defects?
How can the tests be “sandboxed” so that they do not affect external entities like databases? We currently assure that there are no changes to the in-process memory or to the file system, but what about external systems?

This is an older project, where we recently revived the main technique for our more recent work on dynamic code similarity.

Contact Mike Su (mikefhsu@su.columbia.edu) for further information about the recent effort.

Team Members

Faculty
Gail Kaiser

Graduate Students
Fang-Hsiang (“Mike”) Su

Former Graduate Students
Chris Murphy
Jonathan Bell
Matt Chu
Waseem Ilahi
Moses Vaughan

Former Undergraduate Students
Ian Vo

Links

Publications
Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10.

Christian Murphy, Moses Vaughan, Waseem Ilahi and Gail Kaiser. Automatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysis. 5th International Workshop on the Automation of Software Test, May 2010, pp. 16-23.
Christian Murphy, Gail Kaiser, Ian Vo and Matt Chu. Quality Assurance of Software Applications Using the In Vivo Testing Approach. 2nd IEEE International Conference on Software Testing, Verification and Validation (ICST), April 2009, pp. 111-120.
Matt Chu, Christian Murphy and Gail Kaiser. Distributed In Vivo Testing of Software Applications. 1st IEEE International Conference on Software Testing, Verification, and Validation, April 2008, pp. 509-512.

Software
Invite