Home » Projects » Active Projects

Category Archives: Active Projects

SEmantic Security bug detection with Pseudo-Oracles (SESPO)

Semantic security bugs cause serious vulnerabilities across a wide range of software, e.g., bypassing security checks, gaining access to sensitive information, and escalating privileges.  Automatically detecting these bugs is hard because unlike crashing bugs they may not show any obvious side-effects.  The safety and security specifications violated by semantic security bugs are rarely written in a form easy to express as test oracles that define precisely what the behavior should be for every possible input sequence.

Pseudo-oracle testing is one popular technique to identify non-crashing buggy behaviors when true test oracles are unavailable:  Two or more executions are compared, as oracles for each other, to find discrepancies.  The most common pseudo-oracle approach used in security research is differential testing, where multiple independent implementations of the same functionality are compared to see if they agree on, e.g., whether a given untrusted input is valid.   But that only works when multiple independent implementations are available to the developers.

An alternative approach that requires only the one implementation at hand is metamorphic testing, which checks domain-specific metamorphic relations that are defined across sets of input/output pairs from multiple executions of the same test program. For example, a sample metamorphic property for program p adding two inputs a and b can be p(a, b) = p(a, 0) + p(b, 0).  For a program r that sorts an array of inputs, r( [original array] ) should produce the same sorted result as r( [permutation of original array] ).  A machine learning classifier should also produce the same results when the order of the training set is shuffled.

However, it is challenging to apply metamorphic relations to detecting security vulnerabilities, as security properties require richer semantics and often cannot be expressed as simple relationships among input-output pairs.  This project investigates whether there is a comprehensive range of semantically expressive metamorphic relations that can successfully detect semantic security vulnerabilities.

We are seeking undergraduate and MS project students to participate in a pilot study: 1. Create a vulnerability database of known semantic security bugs, their corresponding patches, and the test cases needed to reproduce those bugs.  2. Manually classify the bugs into different sub-categories, e.g., incorrect error handling, missing input validation, API abuse, etc.  3. Determine which properties change across the buggy and fixed versions that can be expressed as violations of metamorphic relations.  4. Inspect other execution paths to check whether these relations are satisfied during benign executions.  5. Find out whether these metamorphic relations are also applicable to similar bugs in other programs.  Students should be interested in security, software testing and program analysis.

Contact: Professor Gail Kaiser, kaiser@cs.columbia.edu. Please put “sespo” in the subject of your email.

Preponderance of the Evidence for SimilAr BEhavioR (SABER)

This project investigates dynamic analysis approaches to identifying behavioral similarities among code elements in the same or different programs, particularly for code that behaves similarly during execution but does not look similar so would be difficult or impossible to detect using static analysis (code clones). While code clone technology is fairly mature, detecting behavioral similarities is challenging.

Dynamic analysis involves executing the candidate programs, and portions of programs such as classes and methods, which means we need to identify program and method inputs that will be useful for identifying code that expert humans would deem similar — without numerous false positives.  For instance, many methods given null or zero inputs produce null or zero outputs and/or execute similar instruction sequences, even though they behave entirely differently with other inputs.  Some methods might coincidentally happen to behave similarity for one or a small number of more complex inputs, but still behave quite differently on the majority of the input space.

We plan to adapt the “preponderance of the evidence” metaphor as to whether a given piece of code is sufficiently similar to the code at hand.  This requires constructing test suites (collections of inputs) suitable for weighing similarities and differences, and devising machine learning or analytical approaches to classifying the corresponding collections of executions as similar or different.  We will seek to combine metrics from both functional input/output vectors and instruction trace (e.g., data dependency graph) representations of executions.

Our initial use case is program understanding:  Most software is developed by teams of engineers, not a single individual working alone. A software product is rarely developed once and for all with no bug fixes or features added after original deployment. Original team members leave the company or the project and new members join. Thus most engineers need to understand code originally written by other persons, indeed studies have found that engineers spend more of their time trying to understand unfamiliar code than creating or modifying code. The engineer might not know what a piece of code is supposed to do, how it works, how it is supposed to work, how to fix it to make it work; documentation might be poor, outdated, or non-existent, and knowledgeable co-workers are not always available.  We hypothesize that showing developers code that behaves similarly to the code at hand may be helpful, particularly if the similar code is better documented, better structured or simply easier to understand.

We are seeking undergraduate and MS project students interested in program understanding, program analysis, compilers and language runtime environments, software testing and machine learning.  Possibly also security since we may apply to behavioral similarities to vulnerability detection and identifying malware.

Contact: Professor Gail Kaiser, kaiser@cs.columbia.edu. Please put “saber” in the subject of your email.

Toward Trustworthy MUtable RePLAY for Security Patches (MUPLAY)

Society is increasingly reliant on software, but deployed software contains security vulnerabilities and other bugs that can threaten privacy, property and even human lives. When a security vulnerability or critical error is discovered, a software patch is issued to attempt to fix the problem, but patches themselves can be incorrect, inadequate, and break necessarily functionality. This project investigates the full workflow for the developer to rapidly diagnose the root cause of the vulnerability or error, for the developer to test that a prospective patch indeed completely removes the defect, and for users to check the issued patch on their own configurations and workloads before adopting the patch.

Record/replay is an emerging technique for low-overhead recording of deployed software, in case a failure or exploit occurs in the field, so the user can transmit the recorded log to the developer to deterministically reproduce the bug. But existing record/replay systems can replay only with the exact same version of the software as recorded, possibly with insertion of debugging instrumentation that does not affect application state. This project investigates “mutable replay” to help reproduce, diagnose, and fix software bugs. Mutable replay will enable logs recorded with the buggy version to be replayed after the modest code changes typical of critical patches, which do impact application state, to show that patches work correctly (or not!) to resolve detected problems.

This project leverages semantic information readily available to the developer (e.g., from their version control repository) to conduct static and dynamic analyses to correctly transform the recorded log to enable mutable replay. For example, the replayer could skip over no longer valid log entries and temporarily go-live when new data is needed.

We are currently modifying rr (https://rr-project.org/) to support mutable replay, since rr is an actively supported open-source record/replay system in wide use and integrated with gdb. Its recorded log includes details of the instruction locations in memory, register contents, and other low-level details rather than just the non-deterministic inputs to the application recorded by some other record/replay systems.  This makes it impossible to use rr’s replayer to run the new version of the executable while feeding from a log recorded with a previous version of the executable.  Instead, we need to patch the old executable to jump into the new executable where they diverge, run that modified code “live”, then jump back if/when they converge.

We would also like to investigate mutable replay for Java/JVM, but first need to find a practical record/replay system to leverage — or build our own.

We are seeking undergraduate and MS project students interested in compilers and language runtime environments, debuggers, program analysis, software testing, software reliability, security and systems (Linux kernel and JVM).

Contact: Anthony Narin, ant@cs.columbia.edu, cc Prof. Gail Kaiser, kaiser@cs.columbia.edu.  Please put “muplay” in the subject of your email.

Gameful Computational Thinking (SAGE)

Seeking undergraduate and graduate students in Computer Science and related fields. Also see separate project description for students with formal background in education.

Inspired by CS for All?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system intended to help teachers infuse computational thinking in grade 6-8 curricula.  We’re recruiting students interested in collaborating as a Scrum team to continue development of several features:

  • A gameful direct instruction system that embeds Parson’s Programming Puzzles in Scratch
  • A gameful constructionism mode that integrates scoring systems for play and automated formative assessment through Blockly
  • A gameful intelligent tutor that provides hints on-demand and just-in-time via clustering, sequential pattern mining, and behavior detection algorithms
  • A gameful affinity space that enables curricula assimilation with quest management and learning metrics using Node.js and Angular

Participants should register for 3 credits of COMS E6901 (graduate students) or COMS W3998 (undergraduates).  CVN students are welcome.

To learn more, please contact Jeff Bender, jeffrey.bender@columbia.edu.

SAGE | Social Addictive Gameful Engineering


Gameful Computational Thinking: seeking students from Teachers College

Inspired by CS for All?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system intended to help teachers infuse computational thinking in grade 6-8 curricula.  We are seeking students with backgrounds in teaching (anything) to middle-school age children, to help us transition to field studies with our partner school and possibly other educational programs.  Software development skills are not needed for these positions, but an interest in applying theories from the learning sciences within educational technology is.  Appropriate academic credit can be arranged on a case by case basis.

To learn more, please contact Jeff Bender, jeffrey.bender@columbia.edu.

SAGE | Social Addictive Gameful Engineering



Code Similarity

Dynamic Code Similarity: This is a multi-disciplinary project joint with Profs. Simha Sethumadhavan and Tony Jebara. “Code clones” are statically similar code fragments that usually arise via copy/paste or independently writing lookalike code; best practice removes clones (refactoring) or tracks them (e.g., to ensure bugs fixed in one clone are also fixed in others). This part of the project instead studies dynamically similar code for two different similarity models. One model is functional similarity, finding code fragments that exhibit similar input/output behavior during execution. Our other dynamic similarity model is the novel notion of behavioral similarity, which we call “code relatives”. Two or more code fragments are deemed code relatives if their executions are similar. We model this as finding similarities among the dynamic data dependency graphs representing instruction-level execution traces. We used machine learning techniques to devise a (relatively) fast inexact subgraph isomorphism algorithm to cluster these execution-level similarities. Our experiments show that both of our tools find most of the same “similar” code as the best static code clone detectors but also find many others they can’t, because the code looks very different even though functionally and/or behaviorally similar; however, dynamic detection will not necessarily find all static code clones because lookalike code involving polymorphism need not exhibit the same function/behavior. Our behavioral and functional similarity detectors do not always find the same similarities, because two or more code fragments may compute the same function using very different algorithms. Thus these kinds of techniques complement each other. Beyond the conventional applications of static code clone detection, dynamic similarity detection also addresses malware detection, program understanding, re-engineering legacy software to use modern APIs, and informing design of hardware accelerators and compiler optimizations.

Static Code Similarity: We also investigate of static similarity detection to augment our similarity detection toolkit. This work is joint with Prof. Baishakhi Ray of the University of Virginia and Prof. Jonathan Bell of George Mason University. Unlike most other static code clone research, we look for similarities at the instruction level rather than in the source code, so our techniques can work even on obfuscated executables where no source code is available and thus conventional static detectors cannot be applied. This situation arises for both malware and misappropriated intellectual property. We exploit the increasingly popular notion of “big code”, i.e., training from open-source repositories, using features that combine instruction-level call graph analysis and topic modeling (an NLP-based machine learning technique). We believe we can effectively deobfuscate most suspect code by finding similarities within a corpus consisting of known code and its obfuscated counterparts. Our approach handles control flow transformations and introduction of extraneous methods, not just method names.

Contact Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Gail Kaiser

Former Graduate Students
Fang-Hsiang (“Mike”) Su
Jonathan Bell
Kenny Harvey   
Apoorv Patwardhan



Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser and Tony Jebara. Code Relatives: Detecting Similarly Behaving Software. 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), November 2016. Artifact accepted as platinum.

Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10. (ACM SIGSOFT Distinguished Paper Award)

Fang-Hsiang Su, Jonathan Bell, and Gail Kaiser. Challenges in Behavioral Code Clone Detection (Position Paper). 10th International Workshop on Software Clones (IWSC), affiliated with IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 2016, volume 3, pp. 21-22. (People’s Choice Award for Best Position Paper)


Download DyCLink from github.

Download HitoshiIO from github.

Download Code Similarity Experiments toolkit from github.