Seeking new PhD student as well as MS thesis and graduate/undergraduate project students.
Most software is developed by teams of engineers, not a single individual working alone. Further, a software product is rarely developed once and for all with no bug fixes or features added after original deployment. Original team members leave the company or the project and new members join. Thus most engineers need to understand software code originally written by other persons, indeed studies have found that engineers spend more of their time trying to understand unfamiliar code than creating or modifying code. The engineer might not know what a piece of code is supposed to do, how it works, how it is supposed to work, how to fix it to make it work; documentation might be poor, outdated, or non-existent, and knowledgeable co-workers are not always available.
This project investigates automated tools and techniques to help engineers more quickly and more completely understand the code they are tasked to work with, to improve their productivity and the quality of their work. With less time spent understanding existing code, engineers have more time to spend on modifying that code, fixing bugs and adding features desired by their users, and creating new software benefiting the public.
More specifically, this project investigates dynamic analysis approaches to identifying behavioral similarities among code elements in the same or different programs, particularly for code that behaves similarly during execution but does not look similar so would be difficult or impossible to detect using static analysis (code clones). While code clone technology is fairly mature, tools for detecting behavioral similarities are relatively primitive. The primary objective is to improve and shape behavioral similarity analysis for practical use cases, concentrating on finding similar code in the same or other codebases that might help developers understand, debug, and add features to unfamiliar code they are tasked to work with.
The project seeks to advance knowledge about what it means for code to be behaviorally similar, how dynamic analyses can identify behavioral code similarities, how to drive the executions necessary for these analyses, and how to leverage code whose behavior is reported as highly similar to the code at hand to achieve common software engineering tasks that may be ill-suited to representational code similarities (code clones). The research investigates the utility and scalability of dynamic analyses seeking behavioral similarities in corresponding representations of code executions; guiding input case generation techniques to produce test executions useful for comparing/contrasting code behaviors for particular use cases; and filtering and weighting schemes for adapting the preponderance of the evidence metaphor to choosing the most convincing similarities for the software engineering task.
To learn more, please contact Prof. Kaiser, kaiser@cs.columbia.edu.