This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Finding latent code errors via machine learning over program executions Yuriy Brun Laboratory for Molecular Science University of Southern California Los Angeles, CA 90089 USA [email protected] Michael D. Ernst Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 USA [email protected] Abstract This paper proposes a technique for identifying program properties that indicate errors. The technique generates ma- chine learning models of program properties known to re- sult from errors, and applies these models to program prop- erties of user-written code to classify and rank properties that may lead the user to errors. Given a set of properties produced by the program analysis, the technique selects a subset of properties that are most likely to reveal an error. An implementation, the Fault Invariant Classifier, demonstrates the efficacy of the technique. The implemen- tation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. In our exper- imental evaluation, the technique increases the relevance (the concentration of fault-revealing properties) by a factor of 50 on average for the C programs, and 4.8 for the Java programs. Preliminary experience suggests that most of the fault-revealing properties do lead a programmer to an error. 1 Introduction Programmers typically use test suites to detect faults in program executions, and thereby to discover errors in pro- gram source code. Once a program passes all the tests in its test suite, testing no longer leads programmers to errors. However, the program is still likely to contain latent errors, and it may be difficult or expensive to generate new test cases that reveal additional faults. Even if new tests can be generated, it may be expensive to compute and verify an oracle that represents the desired behavior of the program. The technique presented in this paper can lead program- mers to latent code errors. The technique does not require a test suite for the target program that separates succeeding from failing runs, so it is particularly applicable to programs whose executions are expensive to verify. The expense may result from difficulty in generating tests, from difficulty in verifying intermediate results, or from difficulty in verify- ing visible behavior (as is often the case for interactive or graphical user interface programs). The new technique takes as input a set of program prop- erties for a given program, and outputs a subset of those properties that are more likely than average to indicate er- rors in the program. The program properties may be gener- ated by an arbitrary program analysis; the experiments re- ported in this paper use a dynamic analysis, but the tech- nique is equally applicable to static analyses....
View Full Document
- Spring '11
- Machine Learning