Home » Projects

Category Archives: Projects

Record/Replay Bug Reproduction for Java

There will inevitably continue to be bugs that are not detected by any testing approach, but eventually impact users who then file bug reports. Reproducing field failures in the development environment can be difficult, however, especially in the case of software that behaves non-deterministically, relies on remote resources, or has complex reproduction steps (the users may not even know what led up to triggering the flaw, particularly in the case of software interacting with external devices, databases, etc. in addition to human users). So a record/replay approach is used to capture the state of the system just before a bug is encountered, so the steps leading up to this state can be replayed later in the lab. The naive approach of constant logging in anticipation of a defect tends to produce unacceptably high overheads (reaching 2,000+ %) in the deployed application. Novel solutions that lower this overhead typically limit the depth of information recorded (e.g., to use only a stack trace, rather than a complete state history) or the breadth of information recorded (e.g., to only log information during execution of a particular subsystem that a developer identifies as potentially buggy). But limiting the depth of information gathered may fail to reproduce an error if the defect does not present itself immediately and limiting logging to a specific subcomponent of an application makes it only possible to reproduce the bug if it occurred within that subcomponent.

Our new technique, called “Chronicler”, instead captures program execution in a manner that allows for deterministic replay in the lab with very low overhead. The key insight is to log sources of non-determinism only at the library level – allowing for a lightweight recording process while still supporting a complete replay for debugging purposes (programs with no sources of non-determinism, e.g., no user interactions, are trivial to replay – just provide the same inputs). When a failure occurs, Chronicler automatically generates a test case that consists of the inputs (e.g., file or network I/O, user inputs, random numbers, etc.) that caused the system to fail. This general approach can be applied to any “managed” language that runs in a language virtual machine (for instance, JVM or Microsoft’s .NET CLR), requiring no modifications to the interpreter or environment, and thus addresses a different class of programs than related work for non-managed languages like C and C++.

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)



Jonathan Bell, Nikhil Sarda and Gail Kaiser. Chronicler: Lightweight Recording to Reproduce Field Failures. 35th International Conference on Software Engineering, May 2013, pp. 362-371. See teaser video at https://www.youtube.com/watch?v=4IYGfdDnAJg.


Download ChroniclerJ.

Gameful Computational Thinking

Inspired by CS for All?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a game-based learning and assessment system that infuses computational thinking in grade 6-8 curricula.  For the Fall 2017 semester, we’re recruiting students interested in collaborating as a Scrum team to complete an MVP expected to include several features:
  • A gameful direct instruction system that embeds Parson’s Programming Puzzles in Scratch
  • A gameful constructionism mode that integrates scoring systems for play and automated assessment through Blockly
  • A gameful intelligent tutor that provides hints on-demand and just-in-time via clustering and sequential pattern mining algorithms
  • A gameful affinity space that enables curricula assimilation with quest management and learning metrics using Node.js
Our projects with students from previous semesters have resulted in the development of a rich architecture and we anticipate a productive period ahead with opportunities for co-authorship of publications that document results from experiments with grade 6-8 students.  Participants will register for 3 credits of COMS E6901, section 014.  CVN students are welcome.  To learn more, please contact Jeff Bender, jeffrey.bender@columbia.edu.
SAGE | Social Addictive Gameful Engineering

Toward Trustworthy Mutable Replay for Security Patches

Society is increasingly reliant on software, but deployed software contains security vulnerabilities and other bugs that can threaten privacy, property and even human lives. When a security vulnerability or other severe defect is discovered, a software patch is issued to attempt to fix the problem – but patches themselves can be incorrect, inadequate, and break mission-critical functionality. This project investigates the full workflow for the developer to rapidly diagnose the root cause of the bug, for the developer to test that a prospective patch indeed completely removes the bug without introducing new errors, and for user organizations to check the issued patch on their own configurations and production workloads before adopting the patch.

This project explores “mutable replay” technology to help reproduce, diagnose, and fix software bugs. A low-overhead recorder embedded in the application records the execution of software in the user environment in case a failure or exploit occurs, allowing the developer to later replay the recorded log – with exactly the same version of the code – to reproduce the problem. Such deterministic record/replay technology is reasonably well-understood. Mutable replay extends record/replay to enable logs recorded with the buggy version to be replayed after the modest code changes typical of critical patches, to show that patches work correctly – or, perhaps more significantly, do not work correctly and further debugging is needed.

We plan to leverage semantic information readily available to the developer, e.g., from the version repository, to conduct well-understood static and dynamic analyses to inform transformations to the recorded log, to reuse the previously recorded responses from interface calls when they match the semantics of the modified code and “go live” to obtain new inputs when they don’t. For example, the recorded log simply ends when a crash occurred during the original recording but “go live” enables the application to continue running beyond the end of the log if the code changes removed the cause of the crash.  In the case where an exploit was injected during the original recording, the modified code that blocks the exploit would “go live” temporarily during the part of the execution where the exploit occurred, but the application may be able to continue execution thereafter using the recorded log for the data that was not tainted by the exploit.

This research involves many interesting problems in program analysis, software testing and debugging, program understanding, and software similarity analysis. The results of this research will benefit society and individuals by simplifying and hastening both generation and validation of patches, ultimately making software more reliable and secure.

Transparent Mutable Replay for Multicore Debugging and Patch Validation describes a proof-of-concept implementation at the Linux operating system level developed several years ago in Prof. Nieh’s lab, which used a simple minimal edit distance metric to guide trial-and-error mutation of the recorded log during replay with a modified version of the code. This works very well in some cases, but cannot handle many common code changes.  We now seek to develop a new prototype interfacing instead at the Java Virtual Machine level, to leverage the higher level semantics available, and guiding replay mutation using static analyses of the modified source code and dynamic analyses of the modified byte code execution.  We also plan to enhance the Linux implementation with analogous analyses.

This is a large effort with numerous subparts, expected to progress over the next three or four years. We are seeking new students at all levels: PhD, MS and undergraduate.

Prospective new PhD students should have, or be able to quickly acquire, deep understanding of the JVM and/or Linux kernel, record/replay technology, and static and dynamic program analyses such as program slicing and taint tracking.

For new undergraduate and MS project students, we prefer students who would like to participate for two or more consecutive semesters. This project is most suited for students who have completed both 4115 and 4156 (or equivalents), or are taking concurrently. A team of collaborating students would be ideal, but individual projects are also possible.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Dynamic Information Flow Analysis

We are investigating an approach to runtime information flow analysis for managed languages
that tracks metadata about data values through the execution of a program. We first considered
metadata that propagates labels representing the originating source of each data value, e.g.,
sensitive data from the address book or GPS of a mobile device that should only be accessed on a
need-to-know basis, or potentially suspect data input by end-users or external systems that
should be sanitized before including in database queries, collectively termed “taint tracking”.
We developed and made available open-source the first general purpose implementation of taint
tracking that operates with minimal performance overhead on commodity Java Virtual Machine
implementations (e.g., from Oracle and OpenJDK), by storing the derived metadata “next to” the
corresponding data values in memory, achieved via bytecode rewriting that does not require
access to source code or any changes to the underlying platform. Previous approaches required
changes to the source code, the language interpreter, the language runtime, the operating system
and/or the hardware, or added unacceptable overhead by storing the metadata separately in a
hashmap. Our system has also been applied to Android, where it required changes in 13 lines of
code, contrasted to the state of the art TaintDroid which added 32,000 lines of code. We are
currently investigating tracking the path conditions constructed during dynamic symbolic
execution of programs, which record the constraints on data values that have reached a given
point in execution (e.g., taking the true or false branch of a series of conditionals). We plan to
use the more sophisticated but slower symbolic execution version as part of several prospective

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Jonathan Bell



Jonathan Bell and Gail Kaiser. Phosphor: Illuminating Dynamic Data Flow in the JVM. Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2014,pp. 83-101. Artifact accepted as meeting reviewer expectations.

Jonathan Bell and Gail Kaiser. Dynamic Taint Tracking for Java with PhosphorInternational Symposium on Software Testing and Analysis (ISSTA), July 2015, pp. 409-413.


Download Phosphor.

Download Knarr.


genSpace Web Services

The core functionalities of genSpace web services can be categorized into 7 facades in the following:

  1. UserFacade: Retrieve corresponding information about the current user in genSpace. The selected web methods are listed as following:
    • getProfile: Retrieve the profile of current user
    • updateUser: Update the information, such as password, for current user
    • getMyNetworks: Retrieve all networks that the current user is in
  2. UsageInformation: Retrieve information about tools and workflows of geWorkbench. The selected web methods are listed as following:
    • getTool: Retrieve the corresponding tool by too ID
    • getWorkflow: Retrieve the corresponding workflow by workflow ID
    • getMyNotes: Retrieve the note from user for a specific analysis event. An analysis event includes an invocation of geWorkbench tool
    • saveNote: Save the public note from user for a specific analysis event. This note can be seen by user and user’s friends
    • savePriv_Note: Save the private note from user for a specific analysis event. This note can be seen by user only
  3. ToolUsageInformation: Retrieve statistic data about tools and workflows of geWorkbench. The selected web methods are listed as following:
    • getToolsByPopularity: Retrieve the data of using times for each tool in geWorkbench
    • getWorkflowsByPopularity: Retrieve the data of using times for each work flow in geWorkbench
    • getToolSuggestion: Retrieve the suggestion for next tool based on current tool
    • sendUsageEvent: Store the logged analysis event from geWorkbench in genSpace
  4. FriendFacade: Retrieve information about friends of current user in genSpace. The selected web methods are listed as following:
    • getFriendRequest: Inform user that a new friend in genSpace who sends a friend-invitation
    • addFriend: Accept friend-invitation from a new friend
    • getFriends: Retrieve all users who are friends of user in genSpace
    • getMyFriendsEvents: Retrieve the analysis events from friends of user in genSpace
  5. PublicFacade: Responsible to register new users. The selected web methods are listed as following:
    • register: Register new users for genSpace. This method is now synchronized with the registration function in geWorkbench. When a new user registers in geWorkbench, she/he will also be registered in genSpace immediately
    • userExists: A guarding method to check if the registered username has been used by a former user
  6. NetworkFacade: Retrieve information about corresponding networks of user in genSpace. the selected web methods are listed as following:
    • createNetwork: Allow user to create a new network in genSpace
    • joinNetwork: Allow user to request joining an existing network in genSpace
    • leaveNetwork: Allow user to leave a network
  7. WorkflowRepository: A managing facade for users to store workflows created by them or their friends in geWorkbench
    • addWorkflow: Add a new workflow in the workflow repository
    • addComment: Add comment to a specific workflow in the workflow repository
    • sendWorkflow: Send an existing workflow in the workflow repository to friends

Sound Build Acceleration

Sound Build Acceleration: Our empirical studies found that the bulk of the clock time during the builds of the ~2000 largest and most popular Java open source software applications is spent running test cases, so we seek to speed up large builds by reducing testing time. This is an important problem because real-world industry builds often take many hours, so developers cannot be informed of any errors introduced by their changes while still in context – as needed for continuous integration (best practice). The consequent lack of attention to failed tests is one of the major reasons that software is deployed with so many security vulnerabilities and other severe bugs. Prior work reduces testing time by running only subsets of the test suite, chosen using various selection criteria. But this model inherently reduces failure detection, and may be unsound because remaining test cases may have dependencies on removed test cases (causing false positives and false negatives). We thought out of the box to substantially reduce measured testing time without removing any test cases at all, thus no reduction in failure detection. For example, we developed tools that use static and dynamic analyses to determine exactly which portion of the state written by previous test cases will be read by the next test case, and instrument the bytecode to just-in-time reinitialize only that dependent portion of the state, rather than restarting the JVM between separate test cases, a common industry practice. Some dependencies are unintentional, so our tools also inform developers so they can re-engineer the code to remove those dependencies. Other dependencies are necessary, because series of tests are needed to build up and check each step of complex usage scenarios; for these our tools bundle dependent test cases and distinguish independent sets of test cases to enable sound parallelization of large test suites.

We expect to use components of this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Jonathan Bell



Jonathan Bell, Gail Kaiser, Eric Melski and Mohan Dattatreya. Efficient Dependency Detection for Safe Java Test Acceleration. 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Aug-Sep 2015, pp. 770-781.

Jonathan Bell, Eric Melski, Gail Kaiser and Mohan Dattatreya. Accelerating Maven by Delaying Dependencies.3rd International Workshop on Release Engineering (RelEng), May 2015, p. 28.

Jonathan Bell, Eric Melski, Mohan Dattatreya and Gail Kaiser. Vroom: Faster Build Processes for Java.IEEE Software, 32(2):97-104, Mar/Apr 2015.

Jonathan Bell and Gail Kaiser. Unit Test Virtualization with VMVM. 36th International Conference on Software Engineering (ICSE), June 2014, pp. 550-561. (ACM SIGSOFT Distinguished Paper Award)

Jonathan Bell and Gail Kaiser. Unit Test Virtualization: Optimizing Testing Time. 2nd International Workshop on Release Engineering (RelEng), April 2014.

Jonathan Bell and Gail Kaiser. VMVM: Unit Test Virtualization for Java. ICSE 2014 Formal Demonstrations Track, Companion Proceedings of 36th International Conference on Software Engineering (ICSE), June 2014, pp. 576-579. Video at https://www.youtube.com/watch?v=sRpqF3rJERI.


Download VmVm.

CS/SE Education

About CS/SE Education

We are exploring new techniques and approaches to improve the teaching of computer science and software engineering. Our recent projects and papers are listed below.

Contact: Swapneel Sheth (swapneel@cs.columbia.edu)

Team Members


Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Phd Students

Swapneel Sheth, swapneel [at] cs.columbia.edu

Former PhD students

Jonathan Bell, jbell [at] cs.columbia.edu
Chris Murphy
, cmurphy [at] cs.columbia.edu

See the Software Project Management project listed on our project student advertisements page.



HALO (Highly Addictive, sociaLly Optimized) Software Engineering




Swapneel Sheth, Jonathan Bell, Gail Kaiser. A Competitive-Collaborative Approach for Introducing Software Engineering in a CS2 Class. 26th Conference on Software Engineering Education and Training (CSEE&T), San Francisco CA, pages 41-50, May 2013

Jonathan Bell, Swapneel Sheth, Gail Kaiser. Secret Ninja Testing with HALO Software Engineering. 4th International Workshop on Social Software Engineering Workshop (SSE), Szeged, Hungary, pages 43-47, September 2011

Christian Murphy, Gail Kaiser, Kristin Loveland, Sahar Hasan. Retina: Helping Students and Instructors Based on Observed Programming Activities. 40th ACM SIGCSE Technical Symposium on Computer Science Education, Chattanooga TN, pages 178-182, March 2009

Christian Murphy, Dan Phung, and Gail Kaiser. A Distance Learning Approach to Teaching eXtreme Programming. 13th Annual ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE), Madrid, Spain, pages 199-203, June 2008

C. Murphy, E. Kim, G. Kaiser, A. Cannon. Backstop: A Tool for Debugging Runtime Errors. 39th ACM SIGCSE Technical Symposium on Computer Science Education, Portland OR, pages 173-177, March 2008

Tech Reports

Kunal Swaroop Mishra, Gail Kaiser. Effectiveness of Teaching Metamorphic Testing.Technical Report CUCS-020-12, Dept. of Computer Science, Columbia University, November 2012

Fine-Grained Data Management Abstractions

We participated in developing novel technology that leverages the storage abstractions of
modern operating systems (e.g., the relational databases and object-relational mappings of
Android) to automatically detect fragments strewn across memory, files and databases that is part
of the same logical application object, such as an email and its attachments, without requiring
source code or any cooperation on the part of application developers. This substrate enabled the development of our prototype tools to check that application-level deletions in fact actually delete all the data fragments related to, say, a document or a photo; to hide (and later unhide) sensitive data, e.g., to protect business data at international border crossings; and to detect when an application collects more data than required by its functionality. In our case study, our system worked correctly on 42 out of 50 real-world applications, and lead to publication of “best practices” rules of thumb required for the approach to work on future applications — e.g., fully declare database schemas, use the database to index file storage, use standard storage libraries, which are admittedly obvious to anyone with the software engineering training that some “app” developers sadly lack.

Contact Professor Roxana Geambasu (roxana@cs.columbia.edu) for further information.

Team Members

Roxana Geambasu
Gail Kaiser

Graduate Students
Riley Spahn

Former Graduate Students
Jonathan Bell



Riley Spahn,  Jonathan Bell, Michael Z. Lee, Sravan Bhamidipati, Roxana Geambasu and Gail Kaiser. Pebbles: Fine-Grained Data Management Abstractions for Modern Operating Systems. 11th USENIX Symposium on Operating Systems Design and Implementation, October 2014, pp. 113-129.

Dynamic Code Similarity

“Code clones” are statically similar code fragments dispersed via copy/paste or independently writing lookalike code; best practice removes clones (refactoring) or tracks them (e.g., to ensure bugs fixed in one clone are also fixed in others). We instead study dynamically similar code, for two different similarity models.

One model is functional similarity, finding code fragments that exhibit similar input/output behavior. Other researchers reported success with C but failure for object-oriented languages, presenting the many challenges to executing OO code fragments in isolation. We side-stepped the isolation problem by adapting in-vivo testing, previously developed by our lab for another purpose, where unit test cases are run in the application states constructed during full system test cases.

Our other dynamic similarity model is the novel notion of behavioral similarity: two code fragments are deemed similar if matching subgraphs can be found in the dynamic data dependency graphs representing the instruction-level execution traces. We developed a fast subgraph isomorphism algorithm to recognize and cluster these execution-level similarities, using PageRank to prioritize the “most important” instructions on which to pivot subgraph comparisons.

Our empirical studies show that our tools find most of the same “similar” code as the best static code clone detectors but also find many others they can’t, because the code looks very different even though functionally and/or behaviorally similar (dynamic detection will not necessarily find all static code clones because lookalike code involving polymorphism need not exhibit the same function/behavior).

We are investigating various applications of statically and dynamically similar code detection, including code search, program understanding, detecting malware, and re-engineering legacy software to use modern APIs.

All project student slots have already been filled for Fall 2016, but there may be openings for Spring 2017.

Contact Mike Su (mikefhsu@cs.columbia.edu)

Team Members

Gail Kaiser
Simha Sethumadhavan
Tony Jebara

Graduate Students
Fang-Hsiang (“Mike”) Su
Apoorv Patwardhan

Former Graduate Students
Jonathan Bell
Kenny Harvey



Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser and Tony Jebara. Code Relatives: Detecting Similarly Behaving Software. 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), November 2016. Artifact accepted as platinum. (To appear, earlier technical report available.)

Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10. (ACM SIGSOFT Distinguished Paper Award)

Fang-Hsiang Su, Jonathan Bell, and Gail Kaiser. Challenges in Behavioral Code Clone Detection (Position Paper). 10th International Workshop on Software Clones (IWSC), affiliated with IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 2016, volume 3, pp. 21-22. (People’s Choice Award for Best Position Paper)


Download DyCLink from github.

Download HitoshiIO from github.

Download Code Similarity Experiments toolkit from github.

An Open Software Framework for the Emulation and Verification of Drosophila Brain Models on Multiple GPUs

We are working with Prof. Aurel Lazar’s Bionet Lab (http://www.bionet.ee.columbia.edu/) to design, implement and experimentally evaluate an open software framework called the Neurokernel that will enable the isolated and integrated emulation of fly brain model neural ciits and their connectivity patterns (e.g., sensory and locomotion systems)  and other parts of the fly’s nervous system on clusters of GPUs, and support the in vivo functional identification of neural circuits.  (Note this is NOT the same meaning of “in vivo” as PSL’s In Vivo Testing project.)

The Neurokernel will:

  1. Enable computational/systems neuroscientists to exploit new connectome data by directing emulation efforts at interoperable local processing units (LPUs), functional subdivisions of the brain that serve as its computational substrate;
  2. Capitalize on the representation of stimuli in the time domain to enable the development of novel asynchronous algorithms for processing spikes with neural circuits;
  3. Serve as an extended machine that will provide abstractions and interfaces for scalably leveraging a powerful commodity parallel computing hardware platform to study a tractable neural system;
  4. Serve as a resource allocator that will enable researchers to transparently take advantage of future improvements in this hardware platform;
  5. Enable testing of models, both by easing the detection and localization of programming errors and by operationally verifying the models’ designs against time-encoded signals to/from live fly brains in real-time;
  6. Accelerate the research community’s progress in developing new brain circuit model by facilitating the sharing and refinement of novel and/or improved models of LPUs and their constituent circuits by different  groups.

To ease its use by the neuroscience community and enable synergy with existing computational tools and packages, we are developing our software framework in Python, a high-level language that has enjoyed great popularity amongst computational neuroscientists.

As we enhance Neurokernel to model new regions of the fly brain, there may be a negative effect on previous models for other regions.  As the fly brain model(s) will be developed in iterative software development cycles, it will be imperative to ensure that each iteration re-verifies the platform and its individual LPUs against the actual fly brain neuropils.  We would like these tests on the Python code to be conducted automatically, without requiring the use of our fly interface equipment — which is manually intensive to operate.  We are constructing a tool to simulate the fly brain interface for software testing purposes that will capture the stimuli provided to the fly along with its responses.  From these sets of inputs and outputs, the tool will automatically generate test cases that recreate the same experiment without the need for repeated interfacing with the fly. This tool will also be used to automatically generate regression tests for the Neurokernel software that depend on other external factors.

Additional information is available on the Bionet website.

Contact Professor Aurel Lazar (aurel@ee.columbia.edu) for further information.

Team Members

Aurel Lazar
Gail Kaiser

Former PSL Graduate Students
Nikhil Sarda