Home » Projects

Category Archives: Projects

SEmantic Security bug detection with Pseudo-Oracles (SESPO)

Semantic security bugs cause serious vulnerabilities across a wide range of software, e.g., bypassing security checks, gaining access to sensitive information, and escalating privileges.  Automatically detecting these bugs is hard because unlike crashing bugs they may not show any obvious side-effects.  The safety and security specifications violated by semantic security bugs are rarely written in a form easy to express as test oracles that define precisely what the behavior should be for every possible input sequence.

Pseudo-oracle testing is one popular technique to identify non-crashing buggy behaviors when true test oracles are unavailable:  Two or more executions are compared, as oracles for each other, to find discrepancies.  The most common pseudo-oracle approach used in security research is differential testing, where multiple independent implementations of the same functionality are compared to see if they agree on, e.g., whether a given untrusted input is valid.   But that only works when multiple independent implementations are available to the developers.

An alternative approach that requires only the one implementation at hand is metamorphic testing, which checks domain-specific metamorphic relations that are defined across sets of input/output pairs from multiple executions of the same test program. For example, a sample metamorphic property for program p adding two inputs a and b can be p(a, b) = p(a, 0) + p(b, 0).  For a program r that sorts an array of inputs, r( [original array] ) should produce the same sorted result as r( [permutation of original array] ).  A machine learning classifier should also produce the same results when the order of the training set is shuffled.

However, it is challenging to apply metamorphic relations to detecting security vulnerabilities, as security properties require richer semantics and often cannot be expressed as simple relationships among input-output pairs.  This project investigates whether there is a comprehensive range of semantically expressive metamorphic relations that can successfully detect semantic security vulnerabilities.

We are seeking undergraduate and MS project students to participate in a pilot study: 1. Create a vulnerability database of known semantic security bugs, their corresponding patches, and the test cases needed to reproduce those bugs.  2. Manually classify the bugs into different sub-categories, e.g., incorrect error handling, missing input validation, API abuse, etc.  3. Determine which properties change across the buggy and fixed versions that can be expressed as violations of metamorphic relations.  4. Inspect other execution paths to check whether these relations are satisfied during benign executions.  5. Find out whether these metamorphic relations are also applicable to similar bugs in other programs.  Students should be interested in security, software testing and program analysis.

Contact: Professor Gail Kaiser, kaiser@cs.columbia.edu. Please put “sespo” in the subject of your email.

Preponderance of the Evidence for SimilAr BEhavioR (SABER)

This project investigates dynamic analysis approaches to identifying behavioral similarities among code elements in the same or different programs, particularly for code that behaves similarly during execution but does not look similar so would be difficult or impossible to detect using static analysis (code clones). While code clone technology is fairly mature, detecting behavioral similarities is challenging.

Dynamic analysis involves executing the candidate programs, and portions of programs such as classes and methods, which means we need to identify program and method inputs that will be useful for identifying code that expert humans would deem similar — without numerous false positives.  For instance, many methods given null or zero inputs produce null or zero outputs and/or execute similar instruction sequences, even though they behave entirely differently with other inputs.  Some methods might coincidentally happen to behave similarity for one or a small number of more complex inputs, but still behave quite differently on the majority of the input space.

We plan to adapt the “preponderance of the evidence” metaphor as to whether a given piece of code is sufficiently similar to the code at hand.  This requires constructing test suites (collections of inputs) suitable for weighing similarities and differences, and devising machine learning or analytical approaches to classifying the corresponding collections of executions as similar or different.  We will seek to combine metrics from both functional input/output vectors and instruction trace (e.g., data dependency graph) representations of executions.

Our initial use case is program understanding:  Most software is developed by teams of engineers, not a single individual working alone. A software product is rarely developed once and for all with no bug fixes or features added after original deployment. Original team members leave the company or the project and new members join. Thus most engineers need to understand code originally written by other persons, indeed studies have found that engineers spend more of their time trying to understand unfamiliar code than creating or modifying code. The engineer might not know what a piece of code is supposed to do, how it works, how it is supposed to work, how to fix it to make it work; documentation might be poor, outdated, or non-existent, and knowledgeable co-workers are not always available.  We hypothesize that showing developers code that behaves similarly to the code at hand may be helpful, particularly if the similar code is better documented, better structured or simply easier to understand.

We are seeking undergraduate and MS project students interested in program understanding, program analysis, compilers and language runtime environments, software testing and machine learning.  Possibly also security since we may apply to behavioral similarities to vulnerability detection and identifying malware.

Contact: Professor Gail Kaiser, kaiser@cs.columbia.edu. Please put “saber” in the subject of your email.

Toward Trustworthy MUtable RePLAY for Security Patches (MUPLAY)

Society is increasingly reliant on software, but deployed software contains security vulnerabilities and other bugs that can threaten privacy, property and even human lives. When a security vulnerability or critical error is discovered, a software patch is issued to attempt to fix the problem, but patches themselves can be incorrect, inadequate, and break necessarily functionality. This project investigates the full workflow for the developer to rapidly diagnose the root cause of the vulnerability or error, for the developer to test that a prospective patch indeed completely removes the defect, and for users to check the issued patch on their own configurations and workloads before adopting the patch.

Record/replay is an emerging technique for low-overhead recording of deployed software, in case a failure or exploit occurs in the field, so the user can transmit the recorded log to the developer to deterministically reproduce the bug. But existing record/replay systems can replay only with the exact same version of the software as recorded, possibly with insertion of debugging instrumentation that does not affect application state. This project investigates “mutable replay” to help reproduce, diagnose, and fix software bugs. Mutable replay will enable logs recorded with the buggy version to be replayed after the modest code changes typical of critical patches, which do impact application state, to show that patches work correctly (or not!) to resolve detected problems.

This project leverages semantic information readily available to the developer (e.g., from their version control repository) to conduct static and dynamic analyses to correctly transform the recorded log to enable mutable replay. For example, the replayer could skip over no longer valid log entries and temporarily go-live when new data is needed.

We are currently modifying rr (https://rr-project.org/) to support mutable replay, since rr is an actively supported open-source record/replay system in wide use and integrated with gdb. Its recorded log includes details of the instruction locations in memory, register contents, and other low-level details rather than just the non-deterministic inputs to the application recorded by some other record/replay systems.  This makes it impossible to use rr’s replayer to run the new version of the executable while feeding from a log recorded with a previous version of the executable.  Instead, we need to patch the old executable to jump into the new executable where they diverge, run that modified code “live”, then jump back if/when they converge.

We would also like to investigate mutable replay for Java/JVM, but first need to find a practical record/replay system to leverage — or build our own.

We are seeking undergraduate and MS project students interested in compilers and language runtime environments, debuggers, program analysis, software testing, software reliability, security and systems (Linux kernel and JVM).

Contact: Anthony Narin, ant@cs.columbia.edu, cc Prof. Gail Kaiser, kaiser@cs.columbia.edu.  Please put “muplay” in the subject of your email.

Gameful Computational Thinking (SAGE)

Seeking undergraduate and graduate students in Computer Science and related fields. Also see separate project description for students with formal background in education.

Inspired by CS for All?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system intended to help teachers infuse computational thinking in grade 6-8 curricula.  We’re recruiting students interested in collaborating as a Scrum team to continue development of several features:

  • A gameful direct instruction system that embeds Parson’s Programming Puzzles in Scratch
  • A gameful constructionism mode that integrates scoring systems for play and automated formative assessment through Blockly
  • A gameful intelligent tutor that provides hints on-demand and just-in-time via clustering, sequential pattern mining, and behavior detection algorithms
  • A gameful affinity space that enables curricula assimilation with quest management and learning metrics using Node.js and Angular

Participants should register for 3 credits of COMS E6901 (graduate students) or COMS W3998 (undergraduates).  CVN students are welcome.

To learn more, please contact Jeff Bender, jeffrey.bender@columbia.edu.

SAGE | Social Addictive Gameful Engineering


Gameful Computational Thinking: seeking students from Teachers College

Inspired by CS for All?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system intended to help teachers infuse computational thinking in grade 6-8 curricula.  We are seeking students with backgrounds in teaching (anything) to middle-school age children, to help us transition to field studies with our partner school and possibly other educational programs.  Software development skills are not needed for these positions, but an interest in applying theories from the learning sciences within educational technology is.  Appropriate academic credit can be arranged on a case by case basis.

To learn more, please contact Jeff Bender, jeffrey.bender@columbia.edu.

SAGE | Social Addictive Gameful Engineering



Record/Replay Bug Reproduction for Java

There will inevitably continue to be bugs that are not detected by any testing approach, but eventually impact users who then file bug reports. Reproducing field failures in the development environment can be difficult, however, especially in the case of software that behaves non-deterministically, relies on remote resources, or has complex reproduction steps (the users may not even know what led up to triggering the flaw, particularly in the case of software interacting with external devices, databases, etc. in addition to human users). So a record/replay approach is used to capture the state of the system just before a bug is encountered, so the steps leading up to this state can be replayed later in the lab. The naive approach of constant logging in anticipation of a defect tends to produce unacceptably high overheads (reaching 2,000+ %) in the deployed application. Novel solutions that lower this overhead typically limit the depth of information recorded (e.g., to use only a stack trace, rather than a complete state history) or the breadth of information recorded (e.g., to only log information during execution of a particular subsystem that a developer identifies as potentially buggy). But limiting the depth of information gathered may fail to reproduce an error if the defect does not present itself immediately and limiting logging to a specific subcomponent of an application makes it only possible to reproduce the bug if it occurred within that subcomponent.

Our new technique, called “Chronicler”, instead captures program execution in a manner that allows for deterministic replay in the lab with very low overhead. The key insight is to log sources of non-determinism only at the library level – allowing for a lightweight recording process while still supporting a complete replay for debugging purposes (programs with no sources of non-determinism, e.g., no user interactions, are trivial to replay – just provide the same inputs). When a failure occurs, Chronicler automatically generates a test case that consists of the inputs (e.g., file or network I/O, user inputs, random numbers, etc.) that caused the system to fail. This general approach can be applied to any “managed” language that runs in a language virtual machine (for instance, JVM or Microsoft’s .NET CLR), requiring no modifications to the interpreter or environment, and thus addresses a different class of programs than related work for non-managed languages like C and C++.

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)



Jonathan Bell, Nikhil Sarda and Gail Kaiser. Chronicler: Lightweight Recording to Reproduce Field Failures. 35th International Conference on Software Engineering, May 2013, pp. 362-371. See teaser video at https://www.youtube.com/watch?v=4IYGfdDnAJg.


Download ChroniclerJ.

Dynamic Information Flow Analysis

We are investigating an approach to runtime information flow analysis for managed languages
that tracks metadata about data values through the execution of a program. We first considered
metadata that propagates labels representing the originating source of each data value, e.g.,
sensitive data from the address book or GPS of a mobile device that should only be accessed on a
need-to-know basis, or potentially suspect data input by end-users or external systems that
should be sanitized before including in database queries, collectively termed “taint tracking”.
We developed and made available open-source the first general purpose implementation of taint
tracking that operates with minimal performance overhead on commodity Java Virtual Machine
implementations (e.g., from Oracle and OpenJDK), by storing the derived metadata “next to” the
corresponding data values in memory, achieved via bytecode rewriting that does not require
access to source code or any changes to the underlying platform. Previous approaches required
changes to the source code, the language interpreter, the language runtime, the operating system
and/or the hardware, or added unacceptable overhead by storing the metadata separately in a
hashmap. Our system has also been applied to Android, where it required changes in 13 lines of
code, contrasted to the state of the art TaintDroid which added 32,000 lines of code. We are
currently investigating tracking the path conditions constructed during dynamic symbolic
execution of programs, which record the constraints on data values that have reached a given
point in execution (e.g., taking the true or false branch of a series of conditionals). We plan to
use the more sophisticated but slower symbolic execution version as part of several prospective

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Jonathan Bell



Jonathan Bell and Gail Kaiser. Phosphor: Illuminating Dynamic Data Flow in the JVM. Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2014,pp. 83-101. Artifact accepted as meeting reviewer expectations.

Jonathan Bell and Gail Kaiser. Dynamic Taint Tracking for Java with PhosphorInternational Symposium on Software Testing and Analysis (ISSTA), July 2015, pp. 409-413.


Download Phosphor.

Download Knarr.


genSpace Web Services

The core functionalities of genSpace web services can be categorized into 7 facades in the following:

  1. UserFacade: Retrieve corresponding information about the current user in genSpace. The selected web methods are listed as following:
    • getProfile: Retrieve the profile of current user
    • updateUser: Update the information, such as password, for current user
    • getMyNetworks: Retrieve all networks that the current user is in
  2. UsageInformation: Retrieve information about tools and workflows of geWorkbench. The selected web methods are listed as following:
    • getTool: Retrieve the corresponding tool by too ID
    • getWorkflow: Retrieve the corresponding workflow by workflow ID
    • getMyNotes: Retrieve the note from user for a specific analysis event. An analysis event includes an invocation of geWorkbench tool
    • saveNote: Save the public note from user for a specific analysis event. This note can be seen by user and user’s friends
    • savePriv_Note: Save the private note from user for a specific analysis event. This note can be seen by user only
  3. ToolUsageInformation: Retrieve statistic data about tools and workflows of geWorkbench. The selected web methods are listed as following:
    • getToolsByPopularity: Retrieve the data of using times for each tool in geWorkbench
    • getWorkflowsByPopularity: Retrieve the data of using times for each work flow in geWorkbench
    • getToolSuggestion: Retrieve the suggestion for next tool based on current tool
    • sendUsageEvent: Store the logged analysis event from geWorkbench in genSpace
  4. FriendFacade: Retrieve information about friends of current user in genSpace. The selected web methods are listed as following:
    • getFriendRequest: Inform user that a new friend in genSpace who sends a friend-invitation
    • addFriend: Accept friend-invitation from a new friend
    • getFriends: Retrieve all users who are friends of user in genSpace
    • getMyFriendsEvents: Retrieve the analysis events from friends of user in genSpace
  5. PublicFacade: Responsible to register new users. The selected web methods are listed as following:
    • register: Register new users for genSpace. This method is now synchronized with the registration function in geWorkbench. When a new user registers in geWorkbench, she/he will also be registered in genSpace immediately
    • userExists: A guarding method to check if the registered username has been used by a former user
  6. NetworkFacade: Retrieve information about corresponding networks of user in genSpace. the selected web methods are listed as following:
    • createNetwork: Allow user to create a new network in genSpace
    • joinNetwork: Allow user to request joining an existing network in genSpace
    • leaveNetwork: Allow user to leave a network
  7. WorkflowRepository: A managing facade for users to store workflows created by them or their friends in geWorkbench
    • addWorkflow: Add a new workflow in the workflow repository
    • addComment: Add comment to a specific workflow in the workflow repository
    • sendWorkflow: Send an existing workflow in the workflow repository to friends

Sound Build Acceleration

Sound Build Acceleration: Our empirical studies found that the bulk of the clock time during the builds of the ~2000 largest and most popular Java open source software applications is spent running test cases, so we seek to speed up large builds by reducing testing time. This is an important problem because real-world industry builds often take many hours, so developers cannot be informed of any errors introduced by their changes while still in context – as needed for continuous integration (best practice). The consequent lack of attention to failed tests is one of the major reasons that software is deployed with so many security vulnerabilities and other severe bugs. Prior work reduces testing time by running only subsets of the test suite, chosen using various selection criteria. But this model inherently reduces failure detection, and may be unsound because remaining test cases may have dependencies on removed test cases (causing false positives and false negatives). We thought out of the box to substantially reduce measured testing time without removing any test cases at all, thus no reduction in failure detection. For example, we developed tools that use static and dynamic analyses to determine exactly which portion of the state written by previous test cases will be read by the next test case, and instrument the bytecode to just-in-time reinitialize only that dependent portion of the state, rather than restarting the JVM between separate test cases, a common industry practice. Some dependencies are unintentional, so our tools also inform developers so they can re-engineer the code to remove those dependencies. Other dependencies are necessary, because series of tests are needed to build up and check each step of complex usage scenarios; for these our tools bundle dependent test cases and distinguish independent sets of test cases to enable sound parallelization of large test suites.

We expect to use components of this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Jonathan Bell



Jonathan Bell, Gail Kaiser, Eric Melski and Mohan Dattatreya. Efficient Dependency Detection for Safe Java Test Acceleration. 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Aug-Sep 2015, pp. 770-781.

Jonathan Bell, Eric Melski, Gail Kaiser and Mohan Dattatreya. Accelerating Maven by Delaying Dependencies.3rd International Workshop on Release Engineering (RelEng), May 2015, p. 28.

Jonathan Bell, Eric Melski, Mohan Dattatreya and Gail Kaiser. Vroom: Faster Build Processes for Java.IEEE Software, 32(2):97-104, Mar/Apr 2015.

Jonathan Bell and Gail Kaiser. Unit Test Virtualization with VMVM. 36th International Conference on Software Engineering (ICSE), June 2014, pp. 550-561. (ACM SIGSOFT Distinguished Paper Award)

Jonathan Bell and Gail Kaiser. Unit Test Virtualization: Optimizing Testing Time. 2nd International Workshop on Release Engineering (RelEng), April 2014.

Jonathan Bell and Gail Kaiser. VMVM: Unit Test Virtualization for Java. ICSE 2014 Formal Demonstrations Track, Companion Proceedings of 36th International Conference on Software Engineering (ICSE), June 2014, pp. 576-579. Video at https://www.youtube.com/watch?v=sRpqF3rJERI.


Download VmVm.

CS/SE Education

About CS/SE Education

We are exploring new techniques and approaches to improve the teaching of computer science and software engineering. Our recent projects and papers are listed below.

Contact: Swapneel Sheth (swapneel@cs.columbia.edu)

Team Members


Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Phd Students

Swapneel Sheth, swapneel [at] cs.columbia.edu

Former PhD students

Jonathan Bell, jbell [at] cs.columbia.edu
Chris Murphy
, cmurphy [at] cs.columbia.edu

See the Software Project Management project listed on our project student advertisements page.



HALO (Highly Addictive, sociaLly Optimized) Software Engineering




Swapneel Sheth, Jonathan Bell, Gail Kaiser. A Competitive-Collaborative Approach for Introducing Software Engineering in a CS2 Class. 26th Conference on Software Engineering Education and Training (CSEE&T), San Francisco CA, pages 41-50, May 2013

Jonathan Bell, Swapneel Sheth, Gail Kaiser. Secret Ninja Testing with HALO Software Engineering. 4th International Workshop on Social Software Engineering Workshop (SSE), Szeged, Hungary, pages 43-47, September 2011

Christian Murphy, Gail Kaiser, Kristin Loveland, Sahar Hasan. Retina: Helping Students and Instructors Based on Observed Programming Activities. 40th ACM SIGCSE Technical Symposium on Computer Science Education, Chattanooga TN, pages 178-182, March 2009

Christian Murphy, Dan Phung, and Gail Kaiser. A Distance Learning Approach to Teaching eXtreme Programming. 13th Annual ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE), Madrid, Spain, pages 199-203, June 2008

C. Murphy, E. Kim, G. Kaiser, A. Cannon. Backstop: A Tool for Debugging Runtime Errors. 39th ACM SIGCSE Technical Symposium on Computer Science Education, Portland OR, pages 173-177, March 2008

Tech Reports

Kunal Swaroop Mishra, Gail Kaiser. Effectiveness of Teaching Metamorphic Testing.Technical Report CUCS-020-12, Dept. of Computer Science, Columbia University, November 2012