Home » Projects

Category Archives: Projects

Improving rr record/replay support for Java

rr (see http://rr-project.org/) is a widely-used open-source C/C++ debugging tool for Linux that enhances gdb with record/replay capabilities.  PSL is seeking several project students (to work in a team) for spring 2018 to adapt rr to record/replay Java applications on top of the JVM.  rr already works with Java/JVM recording/replaying all system calls.  The goal is to modify rr to record/replay only those system calls specific to the Java application recorded/replayed, not the system calls due to the JVM’s internal mechanisms, to improve performance of Java application recording and replaying. (Eventually we plan to modify rr further, to support mutable replay, but that will be a later semester.) Prospective project students should have strong Java/JVM, C/C++ and Linux skills, and preferably have completed 4115 and 4118.

Contact Prof. Kaiser at kaiser@cs.columbia.edu.

Record/Replay Bug Reproduction for Java

There will inevitably continue to be bugs that are not detected by any testing approach, but eventually impact users who then file bug reports. Reproducing field failures in the development environment can be difficult, however, especially in the case of software that behaves non-deterministically, relies on remote resources, or has complex reproduction steps (the users may not even know what led up to triggering the flaw, particularly in the case of software interacting with external devices, databases, etc. in addition to human users). So a record/replay approach is used to capture the state of the system just before a bug is encountered, so the steps leading up to this state can be replayed later in the lab. The naive approach of constant logging in anticipation of a defect tends to produce unacceptably high overheads (reaching 2,000+ %) in the deployed application. Novel solutions that lower this overhead typically limit the depth of information recorded (e.g., to use only a stack trace, rather than a complete state history) or the breadth of information recorded (e.g., to only log information during execution of a particular subsystem that a developer identifies as potentially buggy). But limiting the depth of information gathered may fail to reproduce an error if the defect does not present itself immediately and limiting logging to a specific subcomponent of an application makes it only possible to reproduce the bug if it occurred within that subcomponent.

Our new technique, called “Chronicler”, instead captures program execution in a manner that allows for deterministic replay in the lab with very low overhead. The key insight is to log sources of non-determinism only at the library level – allowing for a lightweight recording process while still supporting a complete replay for debugging purposes (programs with no sources of non-determinism, e.g., no user interactions, are trivial to replay – just provide the same inputs). When a failure occurs, Chronicler automatically generates a test case that consists of the inputs (e.g., file or network I/O, user inputs, random numbers, etc.) that caused the system to fail. This general approach can be applied to any “managed” language that runs in a language virtual machine (for instance, JVM or Microsoft’s .NET CLR), requiring no modifications to the interpreter or environment, and thus addresses a different class of programs than related work for non-managed languages like C and C++.

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)



Jonathan Bell, Nikhil Sarda and Gail Kaiser. Chronicler: Lightweight Recording to Reproduce Field Failures. 35th International Conference on Software Engineering, May 2013, pp. 362-371. See teaser video at https://www.youtube.com/watch?v=4IYGfdDnAJg.


Download ChroniclerJ.

Gameful Computational Thinking

Inspired by CS for All?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system that infuses computational thinking in grade 6-8 curricula.  We’re recruiting students interested in collaborating as a Scrum team to operationalize an MVP which includes several features:
  • A gameful direct instruction system that embeds Parson’s Programming Puzzles in Scratch
  • A gameful constructionism mode that integrates scoring systems for play and automated formative assessment through Blockly
  • A gameful intelligent tutor that provides hints on-demand and just-in-time via clustering, sequential pattern mining, and behavior detection algorithms
  • A gameful affinity space that enables curricula assimilation with quest management and learning metrics using Node.js and Angular
Our projects with students from previous semesters have resulted in the development of a rich architecture and we anticipate a productive period ahead with opportunities for co-authorship of publications that document results from field studies with grade 6-8 students.  Participants will register for 3 credits of COMS E6901, section 028.  CVN students are welcome.
To learn more, please contact Jeff Bender, jeffrey.bender@columbia.edu.
SAGE | Social Addictive Gameful Engineering

Mutable Replay

This project is heavily influenced by previous work from Prof. Jason Nieh in this area, in his case for
x86 for Linux whereas we work with JVM-based managed languages. We are also collaborating with
Prof. Jonathan Bell of George Mason University.  Record/replay records all non-deterministic
inputs received during deployed execution (in a customer environment) so the execution can later be replayed
deterministically in the development lab by reusing those inputs, e.g., to reproduce a reported bug or examine
anomalous behavior for intrusions and other security exploits. Non-deterministic “inputs” here include responses from
system and library calls, e.g., random number generators, network traffic, file system and database accesses, etc., not
just end-user inputs. Conventional record/replay only works when the code replayed in the lab is exactly the same as
the code that had been recorded in the field, or adds only very limited debugging instrumentation that does not effect
the non-deterministic inputs or their dependencies. But it is desirable to replay candidate bug fixes to check that the
code change (mutation) indeed fixes the bug, even though a correct bug fix will almost always require some
new/different inputs and/or produce some new/different outputs, and replay may need to skip over or replace some
previously recorded inputs that are invalidated by changed dependencies. We are investigating static and dynamic
program analysis techniques aimed to determine how to transform the recorded log to enable replay to proceed past
changed portions of the code. This problem is undecidable in the general case (imagine inserting an infinite loop), but
preliminary results show is feasible in many practical cases when the developer’s version repository is available for
analysis (which was not leveraged for Prof. Nieh’s earlier system) and the managed language platform hides
operating system level complexities such as memory mapping and interrupts that may reduce performance.

Contact Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Graduate Students
Anthony Saeiva Narin

Former Graduate Students
Jonathan Bell
Kenny Harvey   


Dynamic Information Flow Analysis

We are investigating an approach to runtime information flow analysis for managed languages
that tracks metadata about data values through the execution of a program. We first considered
metadata that propagates labels representing the originating source of each data value, e.g.,
sensitive data from the address book or GPS of a mobile device that should only be accessed on a
need-to-know basis, or potentially suspect data input by end-users or external systems that
should be sanitized before including in database queries, collectively termed “taint tracking”.
We developed and made available open-source the first general purpose implementation of taint
tracking that operates with minimal performance overhead on commodity Java Virtual Machine
implementations (e.g., from Oracle and OpenJDK), by storing the derived metadata “next to” the
corresponding data values in memory, achieved via bytecode rewriting that does not require
access to source code or any changes to the underlying platform. Previous approaches required
changes to the source code, the language interpreter, the language runtime, the operating system
and/or the hardware, or added unacceptable overhead by storing the metadata separately in a
hashmap. Our system has also been applied to Android, where it required changes in 13 lines of
code, contrasted to the state of the art TaintDroid which added 32,000 lines of code. We are
currently investigating tracking the path conditions constructed during dynamic symbolic
execution of programs, which record the constraints on data values that have reached a given
point in execution (e.g., taking the true or false branch of a series of conditionals). We plan to
use the more sophisticated but slower symbolic execution version as part of several prospective

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Jonathan Bell



Jonathan Bell and Gail Kaiser. Phosphor: Illuminating Dynamic Data Flow in the JVM. Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2014,pp. 83-101. Artifact accepted as meeting reviewer expectations.

Jonathan Bell and Gail Kaiser. Dynamic Taint Tracking for Java with PhosphorInternational Symposium on Software Testing and Analysis (ISSTA), July 2015, pp. 409-413.


Download Phosphor.

Download Knarr.


genSpace Web Services

The core functionalities of genSpace web services can be categorized into 7 facades in the following:

  1. UserFacade: Retrieve corresponding information about the current user in genSpace. The selected web methods are listed as following:
    • getProfile: Retrieve the profile of current user
    • updateUser: Update the information, such as password, for current user
    • getMyNetworks: Retrieve all networks that the current user is in
  2. UsageInformation: Retrieve information about tools and workflows of geWorkbench. The selected web methods are listed as following:
    • getTool: Retrieve the corresponding tool by too ID
    • getWorkflow: Retrieve the corresponding workflow by workflow ID
    • getMyNotes: Retrieve the note from user for a specific analysis event. An analysis event includes an invocation of geWorkbench tool
    • saveNote: Save the public note from user for a specific analysis event. This note can be seen by user and user’s friends
    • savePriv_Note: Save the private note from user for a specific analysis event. This note can be seen by user only
  3. ToolUsageInformation: Retrieve statistic data about tools and workflows of geWorkbench. The selected web methods are listed as following:
    • getToolsByPopularity: Retrieve the data of using times for each tool in geWorkbench
    • getWorkflowsByPopularity: Retrieve the data of using times for each work flow in geWorkbench
    • getToolSuggestion: Retrieve the suggestion for next tool based on current tool
    • sendUsageEvent: Store the logged analysis event from geWorkbench in genSpace
  4. FriendFacade: Retrieve information about friends of current user in genSpace. The selected web methods are listed as following:
    • getFriendRequest: Inform user that a new friend in genSpace who sends a friend-invitation
    • addFriend: Accept friend-invitation from a new friend
    • getFriends: Retrieve all users who are friends of user in genSpace
    • getMyFriendsEvents: Retrieve the analysis events from friends of user in genSpace
  5. PublicFacade: Responsible to register new users. The selected web methods are listed as following:
    • register: Register new users for genSpace. This method is now synchronized with the registration function in geWorkbench. When a new user registers in geWorkbench, she/he will also be registered in genSpace immediately
    • userExists: A guarding method to check if the registered username has been used by a former user
  6. NetworkFacade: Retrieve information about corresponding networks of user in genSpace. the selected web methods are listed as following:
    • createNetwork: Allow user to create a new network in genSpace
    • joinNetwork: Allow user to request joining an existing network in genSpace
    • leaveNetwork: Allow user to leave a network
  7. WorkflowRepository: A managing facade for users to store workflows created by them or their friends in geWorkbench
    • addWorkflow: Add a new workflow in the workflow repository
    • addComment: Add comment to a specific workflow in the workflow repository
    • sendWorkflow: Send an existing workflow in the workflow repository to friends

Sound Build Acceleration

Sound Build Acceleration: Our empirical studies found that the bulk of the clock time during the builds of the ~2000 largest and most popular Java open source software applications is spent running test cases, so we seek to speed up large builds by reducing testing time. This is an important problem because real-world industry builds often take many hours, so developers cannot be informed of any errors introduced by their changes while still in context – as needed for continuous integration (best practice). The consequent lack of attention to failed tests is one of the major reasons that software is deployed with so many security vulnerabilities and other severe bugs. Prior work reduces testing time by running only subsets of the test suite, chosen using various selection criteria. But this model inherently reduces failure detection, and may be unsound because remaining test cases may have dependencies on removed test cases (causing false positives and false negatives). We thought out of the box to substantially reduce measured testing time without removing any test cases at all, thus no reduction in failure detection. For example, we developed tools that use static and dynamic analyses to determine exactly which portion of the state written by previous test cases will be read by the next test case, and instrument the bytecode to just-in-time reinitialize only that dependent portion of the state, rather than restarting the JVM between separate test cases, a common industry practice. Some dependencies are unintentional, so our tools also inform developers so they can re-engineer the code to remove those dependencies. Other dependencies are necessary, because series of tests are needed to build up and check each step of complex usage scenarios; for these our tools bundle dependent test cases and distinguish independent sets of test cases to enable sound parallelization of large test suites.

We expect to use components of this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Jonathan Bell



Jonathan Bell, Gail Kaiser, Eric Melski and Mohan Dattatreya. Efficient Dependency Detection for Safe Java Test Acceleration. 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Aug-Sep 2015, pp. 770-781.

Jonathan Bell, Eric Melski, Gail Kaiser and Mohan Dattatreya. Accelerating Maven by Delaying Dependencies.3rd International Workshop on Release Engineering (RelEng), May 2015, p. 28.

Jonathan Bell, Eric Melski, Mohan Dattatreya and Gail Kaiser. Vroom: Faster Build Processes for Java.IEEE Software, 32(2):97-104, Mar/Apr 2015.

Jonathan Bell and Gail Kaiser. Unit Test Virtualization with VMVM. 36th International Conference on Software Engineering (ICSE), June 2014, pp. 550-561. (ACM SIGSOFT Distinguished Paper Award)

Jonathan Bell and Gail Kaiser. Unit Test Virtualization: Optimizing Testing Time. 2nd International Workshop on Release Engineering (RelEng), April 2014.

Jonathan Bell and Gail Kaiser. VMVM: Unit Test Virtualization for Java. ICSE 2014 Formal Demonstrations Track, Companion Proceedings of 36th International Conference on Software Engineering (ICSE), June 2014, pp. 576-579. Video at https://www.youtube.com/watch?v=sRpqF3rJERI.


Download VmVm.

CS/SE Education

About CS/SE Education

We are exploring new techniques and approaches to improve the teaching of computer science and software engineering. Our recent projects and papers are listed below.

Contact: Swapneel Sheth (swapneel@cs.columbia.edu)

Team Members


Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Phd Students

Swapneel Sheth, swapneel [at] cs.columbia.edu

Former PhD students

Jonathan Bell, jbell [at] cs.columbia.edu
Chris Murphy
, cmurphy [at] cs.columbia.edu

See the Software Project Management project listed on our project student advertisements page.



HALO (Highly Addictive, sociaLly Optimized) Software Engineering




Swapneel Sheth, Jonathan Bell, Gail Kaiser. A Competitive-Collaborative Approach for Introducing Software Engineering in a CS2 Class. 26th Conference on Software Engineering Education and Training (CSEE&T), San Francisco CA, pages 41-50, May 2013

Jonathan Bell, Swapneel Sheth, Gail Kaiser. Secret Ninja Testing with HALO Software Engineering. 4th International Workshop on Social Software Engineering Workshop (SSE), Szeged, Hungary, pages 43-47, September 2011

Christian Murphy, Gail Kaiser, Kristin Loveland, Sahar Hasan. Retina: Helping Students and Instructors Based on Observed Programming Activities. 40th ACM SIGCSE Technical Symposium on Computer Science Education, Chattanooga TN, pages 178-182, March 2009

Christian Murphy, Dan Phung, and Gail Kaiser. A Distance Learning Approach to Teaching eXtreme Programming. 13th Annual ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE), Madrid, Spain, pages 199-203, June 2008

C. Murphy, E. Kim, G. Kaiser, A. Cannon. Backstop: A Tool for Debugging Runtime Errors. 39th ACM SIGCSE Technical Symposium on Computer Science Education, Portland OR, pages 173-177, March 2008

Tech Reports

Kunal Swaroop Mishra, Gail Kaiser. Effectiveness of Teaching Metamorphic Testing.Technical Report CUCS-020-12, Dept. of Computer Science, Columbia University, November 2012

Fine-Grained Data Management Abstractions

We participated in developing novel technology that leverages the storage abstractions of
modern operating systems (e.g., the relational databases and object-relational mappings of
Android) to automatically detect fragments strewn across memory, files and databases that is part
of the same logical application object, such as an email and its attachments, without requiring
source code or any cooperation on the part of application developers. This substrate enabled the development of our prototype tools to check that application-level deletions in fact actually delete all the data fragments related to, say, a document or a photo; to hide (and later unhide) sensitive data, e.g., to protect business data at international border crossings; and to detect when an application collects more data than required by its functionality. In our case study, our system worked correctly on 42 out of 50 real-world applications, and lead to publication of “best practices” rules of thumb required for the approach to work on future applications — e.g., fully declare database schemas, use the database to index file storage, use standard storage libraries, which are admittedly obvious to anyone with the software engineering training that some “app” developers sadly lack.

Contact Professor Roxana Geambasu (roxana@cs.columbia.edu) for further information.

Team Members

Roxana Geambasu
Gail Kaiser

Graduate Students
Riley Spahn

Former Graduate Students
Jonathan Bell



Riley Spahn,  Jonathan Bell, Michael Z. Lee, Sravan Bhamidipati, Roxana Geambasu and Gail Kaiser. Pebbles: Fine-Grained Data Management Abstractions for Modern Operating Systems. 11th USENIX Symposium on Operating Systems Design and Implementation, October 2014, pp. 113-129.

Code Similarity

Dynamic Code Similarity: This is a multi-disciplinary project joint with Profs. Simha Sethumadhavan and Tony Jebara. “Code clones” are statically similar code fragments that usually arise via copy/paste or independently writing lookalike code; best practice removes clones (refactoring) or tracks them (e.g., to ensure bugs fixed in one clone are also fixed in others). This part of the project instead studies dynamically similar code for two different similarity models. One model is functional similarity, finding code fragments that exhibit similar input/output behavior during execution. Our other dynamic similarity model is the novel notion of behavioral similarity, which we call “code relatives”. Two or more code fragments are deemed code relatives if their executions are similar. We model this as finding similarities among the dynamic data dependency graphs representing instruction-level execution traces. We used machine learning techniques to devise a (relatively) fast inexact subgraph isomorphism algorithm to cluster these execution-level similarities. Our experiments show that both of our tools find most of the same “similar” code as the best static code clone detectors but also find many others they can’t, because the code looks very different even though functionally and/or behaviorally similar; however, dynamic detection will not necessarily find all static code clones because lookalike code involving polymorphism need not exhibit the same function/behavior. Our behavioral and functional similarity detectors do not always find the same similarities, because two or more code fragments may compute the same function using very different algorithms. Thus these kinds of techniques complement each other. Beyond the conventional applications of static code clone detection, dynamic similarity detection also addresses malware detection, program understanding, re-engineering legacy software to use modern APIs, and informing design of hardware accelerators and compiler optimizations.

Static Code Similarity: We also investigate of static similarity detection to augment our similarity detection toolkit. This work is joint with Prof. Baishakhi Ray of the University of Virginia and Prof. Jonathan Bell of George Mason University. Unlike most other static code clone research, we look for similarities at the instruction level rather than in the source code, so our techniques can work even on obfuscated executables where no source code is available and thus conventional static detectors cannot be applied. This situation arises for both malware and misappropriated intellectual property. We exploit the increasingly popular notion of “big code”, i.e., training from open-source repositories, using features that combine instruction-level call graph analysis and topic modeling (an NLP-based machine learning technique). We believe we can effectively deobfuscate most suspect code by finding similarities within a corpus consisting of known code and its obfuscated counterparts. Our approach handles control flow transformations and introduction of extraneous methods, not just method names.

Contact Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Fang-Hsiang (“Mike”) Su
Jonathan Bell
Kenny Harvey   
Apoorv Patwardhan



Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser and Tony Jebara. Code Relatives: Detecting Similarly Behaving Software. 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), November 2016. Artifact accepted as platinum.

Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10. (ACM SIGSOFT Distinguished Paper Award)

Fang-Hsiang Su, Jonathan Bell, and Gail Kaiser. Challenges in Behavioral Code Clone Detection (Position Paper). 10th International Workshop on Software Clones (IWSC), affiliated with IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 2016, volume 3, pp. 21-22. (People’s Choice Award for Best Position Paper)


Download DyCLink from github.

Download HitoshiIO from github.

Download Code Similarity Experiments toolkit from github.