Home » Projects (Page 2)

Category Archives: Projects

An Open Software Framework for the Emulation and Verification of Drosophila Brain Models on Multiple GPUs

We are working with Prof. Aurel Lazar’s Bionet Lab (http://www.bionet.ee.columbia.edu/) to design, implement and experimentally evaluate an open software framework called the Neurokernel that will enable the isolated and integrated emulation of fly brain model neural ciits and their connectivity patterns (e.g., sensory and locomotion systems)  and other parts of the fly’s nervous system on clusters of GPUs, and support the in vivo functional identification of neural circuits.  (Note this is NOT the same meaning of “in vivo” as PSL’s In Vivo Testing project.)

The Neurokernel will:

  1. Enable computational/systems neuroscientists to exploit new connectome data by directing emulation efforts at interoperable local processing units (LPUs), functional subdivisions of the brain that serve as its computational substrate;
  2. Capitalize on the representation of stimuli in the time domain to enable the development of novel asynchronous algorithms for processing spikes with neural circuits;
  3. Serve as an extended machine that will provide abstractions and interfaces for scalably leveraging a powerful commodity parallel computing hardware platform to study a tractable neural system;
  4. Serve as a resource allocator that will enable researchers to transparently take advantage of future improvements in this hardware platform;
  5. Enable testing of models, both by easing the detection and localization of programming errors and by operationally verifying the models’ designs against time-encoded signals to/from live fly brains in real-time;
  6. Accelerate the research community’s progress in developing new brain circuit model by facilitating the sharing and refinement of novel and/or improved models of LPUs and their constituent circuits by different  groups.

To ease its use by the neuroscience community and enable synergy with existing computational tools and packages, we are developing our software framework in Python, a high-level language that has enjoyed great popularity amongst computational neuroscientists.

As we enhance Neurokernel to model new regions of the fly brain, there may be a negative effect on previous models for other regions.  As the fly brain model(s) will be developed in iterative software development cycles, it will be imperative to ensure that each iteration re-verifies the platform and its individual LPUs against the actual fly brain neuropils.  We would like these tests on the Python code to be conducted automatically, without requiring the use of our fly interface equipment — which is manually intensive to operate.  We are constructing a tool to simulate the fly brain interface for software testing purposes that will capture the stimuli provided to the fly along with its responses.  From these sets of inputs and outputs, the tool will automatically generate test cases that recreate the same experiment without the need for repeated interfacing with the fly. This tool will also be used to automatically generate regression tests for the Neurokernel software that depend on other external factors.

Additional information is available on the Bionet website.

Contact Professor Aurel Lazar (aurel@ee.columbia.edu) for further information.

Team Members

Aurel Lazar
Gail Kaiser

Former PSL Graduate Students
Nikhil Sarda

Metamorphic Testing

Metamorphic testing was originally developed, by others, as an approach to deriving new test
cases from an existing test suite, seeking to find additional bugs not found by the original tests.
Given a known execution function(input) produces output, the metamorphic properties of a
function (or of an entire application) enable automatic derivation of a new input’ from input such
that the expected output’ can be predicted from output. If the actual output’’ is different from
output’, then there is a flaw in the code or its documentation. We expanded metamorphic testing
in several ways, initially to apply to “non-testable programs”, where there is no test oracle; that
is, metamorphic testing can detect bugs even when we do not know whether output is correct for
input (so conventional testing techniques may not be useful). This problem arises for the
machine learning, data mining, search, simulation and optimization applications prevalent in “big
data” analysis. For example, if a machine learning program generates clusters from a set of
examples, one would expect it to produce the same clusters when the order of the input examples
is permuted; however, we have found anomalies in several widely used machine learning
libraries (e.g., Weka) where the result is different from expected when the set of input examples
is modified in some simple way. We are investigating how to extend the notion of metamorphic
properties to before and after state, beyond just input/output parameters, to find bugs that affect
the internal state but are not evident from input/output. Most recently we developed a tool for
automatically discovering candidate metamorphic properties from execution profiling that
performs better than student subjects; the state of the art is for a human domain expert to
manually define the properties, a tedious, error-prone and expensive process.

Team Members

Gail Kaiser

Graduate Students
Fang-Hsiang (“Mike”) Su

Former Graduate Students
Chris Murphy
Jonathan Bell



Fang-Hsiang Su, Jonathan Bell, Christian Murphy and Gail Kaiser. Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing. 10th IEEE/ACM International Workshop on Automation of Software Test (AST), May 2015, pp. 55-59.

Jonathan Bell, Christian Murphy and Gail Kaiser. Metamorphic Runtime Checking of Applications Without Test Oracles. Crosstalk the Journal of Defense Software Engineering, 28(2):9-13, Mar/Apr 2015.

Christian Murphy, M. S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Leon Osterweil. On Effective Testing of Health Care Simulation Software. 3rd International Workshop on Software Engineering in Health Care (SEHC), May 2011, pp. 40-47.

Xiaoyuan Xie, Joshua W. K. Ho, Christian Murphy, Gail Kaiser, Baowen Xu and Tsong Yueh Chen.  Testing and Validating Machine Learning Classifiers by Metamorphic Testing.  Journal of Systems and Software (JSS), Elsevier, 84(4):544-558, April 2011.

Christian Murphy, Kuang Shen and Gail Kaiser. Automatic System Testing of Programs without Test Oracles. International Symposium on Software Testing and Analysis (ISSTA), July 2009, pp. 189-200.

Christian Murphy, Kuang Shen and Gail Kaiser. Using JML Runtime Assertion Checking to Perform Metamorphic Testing in Applications without Test Oracles. 2nd IEEE International Conference on Software Testing, Verification and Validation (ICST), April 2009, pp. 436-445.

Christian Murphy, Gail Kaiser, Lifeng Hu and Leon Wu. Properties of Machine Learning Applications for Use in Metamorphic Testing. 20th International Conference on Software Engineering and Knowledge Engineering (SEKE), July 2008, pp. 867-872.

Christian Murphy, Gail Kaiser and Marta Arias. Parameterizing Random Test Data According to Equivalence Classes. 2nd ACM International Workshop on Random Testing (RT), November 2007, pp.38-41.

Christian Murphy, Gail Kaiser and Marta Arias. An Approach to Software Testing of Machine Learning Applications. 19th International Conference on Software Engineering and Knowledge Engineering (SEKE), July 2007, pp. 167-172.


Download <a href="http://” target=”_blank”>Kabu.

In Vivo Testing

Software products released into the field typically contain residual defects that either were not detected or could not have been detected during pre-deployment testing. For many large, complex software systems, it is infeasible in terms of time and cost to reliably test all configuration options before release using unit test virtualization, test suite minimization, or any other known approach. For example, Microsoft Internet Explorer has over 19 trillion possible combinations of configuration settings. Even given infinite time and resources to test an application and all its configurations, other software on which a software product depends or with which it interacts (e.g., sensor networks, libraries, virtual machines, etc.) are often updated after the product’s release; it is impossible to test with these dependencies prior to the application’s release, because they did not exist yet. Further, as multi-processor and multi-core systems become more prevalent, multi-threaded applications that had only been tested on single- or dual-processor/core machines are more likely to reveal concurrency bugs.

We are investigating a testing methodology that we call “in vivo” testing, in which tests are continuously executed in the deployment environment. This requires a new type of test case, called in vivo tests, which are designed to run from within the executing application in the states achieved during normal end-user operation rather than in a re-initialized or artificial pre-test state. These tests focus on aspects of the program that should hold true regardless of what state the system is in, but differ from conventional assertion checking, since assertions are prohibited from introducing side-effects: in vivo tests may indeed and typically do have side-effects on the application’s in-memory state, external files, I/O, etc. but these are all “hidden” from users by cloning the executing application to run the test cases in the same kind of sandbox often aimed to address security concerns. The in vivo approach can be used for detecting concurrency, security or robustness issues, as well as conventional flaws that may not have appeared in a testing lab (the “in vitro” environment). Our most recent research concerns how to reduce the overhead of such deployment-time testing as well as automatic generation of some of the in vivo test cases from traditional pre-existing unit tests.

In Fall 2007, we developed a prototype framework called Invite, which is described in our tech report and was presented as a poster at ISSTA 2008 (a variant of this paper was presented at ICST 2009, and is available here). This implementation uses an AspectJ component to instrument selected classes in a Java application, such that each method call in those classes has some chance (configurable on a per-method basis) of executing the method’s corresponding unit test. When a test is run, Invite forks off a new process in which to run the test, and the results are logged.

We also developed a distributed version of Invite, which seeks to amortize the testing load across a community of applications; a paper was published in the student track of ICST 2008. This version currently uses only one global value for the probability of running a test, instead of one per method, however. That value is set by a central server, depending on the size of the “application community”.

In Spring 2008, we looked at various mechanisms for reducing the performance impact of Invite, e.g. by assigning tests to different cores/processors on multi-core/multi-processor machines, or by limiting the number of concurrent tests that may be run. We also looked at ways of balancing testing load across members of a community so that instances under light load pick up more of the testing. Lastly, we created a modified JDK that allows Invite to create copies of files so that in vivo tests do not alter the “real” file system.

In Fall 2008, we ported the Invite framework to C and evaluated more efficient mechanisms for injecting the instrumentation and executing the tests. We also investigated fault localization techniques, which collect data from failed program executions and attempt to discover what caused the failure.

Recently we have investigated ways to make the technique more efficient by only running tests in application states it hasn’t seen before. This cuts down on the number of redundant states that are tested, thus reducing the performance overhead. This work has potential application to domains like model checking and dynamic analysis and was presented in a workshop paper at AST 2010.

Currently we are looking at ways to apply the In Vivo approach to the domain of security testing. Specifically, we devised an approach called Configuration Fuzzing in which the In Vivo tests make slight changes to the application configuration and then check “security invariants” to see if there are any vulnerabilities that are configuration-related. This work was presented at the 2010 Workshop on Secure Software Engineering.

In 2012-2013, we are investigating techniques to efficiently isolate the state of the tests, so as to avoid the effect of the tests on external systems.

Open research questions include:

  • Can the overhead be reduced by offloading test processes to other machines? This is especially important when the application is running on a single-core machine.
  • What sorts of defects are most likely to be detected with such an approach? How can we objectively measure the approach’s effectiveness at detecting defects?
  • How can the tests be “sandboxed” so that they do not affect external entities like databases? We currently assure that there are no changes to the in-process memory or to the file system, but what about external systems?

This is an older project, where we recently revived the main technique for our more recent work on dynamic code similarity.

Contact Mike Su (mikefhsu@su.columbia.edu) for further information about the recent effort.

Team Members

Gail Kaiser

Graduate Students
Fang-Hsiang (“Mike”) Su

Former Graduate Students
Chris Murphy
Jonathan Bell
Matt Chu
Waseem Ilahi
Moses Vaughan

Former Undergraduate Students
Ian Vo


Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10.

Christian Murphy, Moses Vaughan, Waseem Ilahi and Gail Kaiser. Automatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysis. 5th International Workshop on the Automation of Software Test, May 2010, pp. 16-23.
Christian Murphy, Gail Kaiser, Ian Vo and Matt Chu. Quality Assurance of Software Applications Using the In Vivo Testing Approach. 2nd IEEE International Conference on Software Testing, Verification and Validation (ICST), April 2009, pp. 111-120.
Matt Chu, Christian Murphy and Gail Kaiser. Distributed In Vivo Testing of Software Applications. 1st IEEE International Conference on Software Testing, Verification, and Validation, April 2008, pp. 509-512.


Societal Computing

Societal Computing research is concerned with the impact of computational tradeoffs on societal issues and focuses on aspects of computer science that address significant issues and concerns facing the society as a whole such as Privacy, Climate Change, Green Computing, Sustainability, and Cultural Differences. In particular, Societal Computing research will focus on the research challenges that arise due to the tradeoffs among these areas.

As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing,
another major area of societal concern.

Societal Computing focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole.

We studied how software developers vs. end-users perceive data privacy requirements (e.g.,
Facebook), and which concrete measures would mitigate privacy concerns. We conducted a
survey with closed and open questions and collected over 400 valid responses. We found that
end-users often imagine that imposing privacy laws and policies is sufficient, whereas
developers clearly prefer technical measures; it is not terribly surprising that developers familiar
with how software works do not think merely passing a law will be effective. We also found that
both users and developers from Europe and Asia/Pacific are much more concerned about the
possibility of privacy breaches that those from North America.

Team Members

Gail Kaiser

Former Graduate Students
Swapneel Sheth



Swapneel Sheth, Gail Kaiser and Walid Maalej. Us and Them — A Study of Privacy Requirements Across North America, Asia, and Europe. 36th International Conference on Software Engineering (ICSE), pp. 859-870, June 2014.

Swapneel Sheth and Gail Kaiser. The Tradeoffs of Societal Computing. Onward!: ACM Symposium on New Ideas in Programming and Reflections on Software, October 2011, pp. 149-156.



System reliability is a fundamental requirement of Cyber-Physical System (CPS), i.e., a system featuring a tight combination of, and coordination between, the systems computational and physical elements. Cyber-physical system includes systems ranging from the critical infrastructure such as power grid and transportation system to the health and biomedical devices. An unreliable system often leads to disruption of service, financial cost and even loss of human life. In this work, we aim to improve system reliability for cyber-physical systems that meet following criteria: processing large amount of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and accountability requirement for safety critical systems. The reason that we limit the system scope to this type of cyber-physical system is that this type of cyber-physical systems are important and becoming more prevalent.

To improve system reliability for this type of cyber-physical systems, we employ a novel system evaluation approach named automated online evaluation. It works in parallel with the cyber- physical system to conduct automated evaluation at the multiple stages along the workflow of the system continuously and provide operator-in-the-loop feedback on reliability improvement. It is an approach whereby data from cyber-physical system is evaluated. For example, abnormal input and output data can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and higher system reliability. To implement the approach, we design a system architecture named ARIS (Autonomic Reliability Improvement System).

One technique used by the approach is data quality analysis using computational intelligence that applies computational intelligence in evaluating data quality in some automated and efficient way to ensure data quality and make sure the running system to perform as expected reliably. The computational intelligence is enabled by machine learning, data mining, statistical and probabilistic analysis, and other intelligent techniques. In a cyber-physical system, the data collected from the system, e.g., software bug reports, system status logs and error reports, are stored in some databases. In our approach, these data are analyzed via data mining and other intelligent techniques so that useful information on system reliability including erroneous data and abnormal system state can be concluded. These reliability related information are directed to operators so that proper actions can be taken, sometimes proactively based on the predictive results, to ensure the proper and reliable execution of the system.

Another technique used by the approach is self-tuning that automatically self-manages and self-configures the evaluation system to ensure it adapts itself based on the changes in the system and feedback from the operator. The self-tuning adapts the evaluation system to ensure its proper functioning, which leads to a more robust evaluation system and improved system reliability.


Project Members

Faculty: Prof. Gail Kaiser

PhD Candidate: Leon Wu



Leon Wu and Gail Kaiser. FARE: A Framework for Benchmarking Reliability of Cyber-Physical Systems. In Proceedings of the 9th Annual IEEE Long Island Systems, Applications and Technology Conference (LISAT), May 2013.

Leon Wu and Gail Kaiser. An Autonomic Reliability Improvement System for Cyber-Physical Systems. In Proceedings of the IEEE 14th International Symposium on High-Assurance Systems Engineering (HASE), October 2012.

Leon Wu, Gail Kaiser, David Solomon, Rebecca Winter, Albert Boulanger, and Roger Anderson. Improving Efficiency and Reliability of Building Systems Using Machine Learning and Automated Online Evaluation. In the 8th Annual IEEE Long Island Systems, Applications and Technology Conference (LISAT), May 2012.

Rebecca Winter, David Solomon, Albert Boulanger, Leon Wu, and Roger Anderson. Using Support Vector Machine to Forecast Energy Usage of a Manhattan Skyscraper. In New York Academy of Science Sixth Annual Machine Learning Symposium, New York, NY, USA, October 2011.

Leon Wu, Gail Kaiser, Cynthia Rudin, and Roger Anderson. Data Quality Assurance and Performance Measurement of Data Mining for Preventive Maintenance of Power Grid. In Proceedings of the ACM SIGKDD 2011 Workshop on Data Mining for Service and Maintenance, August 2011.

Leon Wu and Gail Kaiser. Constructing Subtle Concurrency Bugs Using Synchronization-Centric Second-Order Mutation Operators. In Proceedings of the 23th International Conference on Software Engineering and Knowledge Engineering (SEKE), July 2011.

Leon Wu, Boyi Xie, Gail Kaiser, and Rebecca Passonneau. BugMiner: Software Reliability Analysis Via Data Mining of Bug Reports. In Proceedings of the 23th International Conference on Software Engineering and Knowledge Engineering (SEKE), July 2011.

Leon Wu, Gail Kaiser, Cynthia Rudin, David Waltz, Roger Anderson, Albert Boulanger, Ansaf Salleb-Aouissi, Haimonti Dutta, and Manoj Pooleery. Evaluating Machine Learning for Improving Power Grid Reliability. In ICML 2011 Workshop on Machine Learning for Global Challenges, July 2011.

Leon Wu, Timothy Teräväinen, Gail Kaiser, Roger Anderson, Albert Boulanger, and Cynthia Rudin. Estimation of System Reliability Using a Semiparametric Model. In Proceedings of the IEEE EnergyTech 2011 (EnergyTech), May 2011.

Cynthia Rudin, David Waltz, Roger Anderson, Albert Boulanger, Ansaf Salleb-Aouissi, Maggie Chow, Haimonti Dutta, Phil Gross, Bert Huang, Steve Ierome, Delfina Isaac, Artie Kressner, Rebecca Passonneau, Axinia Radeva, and Leon Wu. Machine Learning for the New York City Power Grid. IEEE Transactions on Pattern Analysis and Machine Intelligence, May 2011.


Gameful Approaches to Computer Science Education

HALO, or Highly Addictive sociaLly Optimized Software Engineering, represents a new and social approach to software engineering. Using various engaging and addictive properties of collaborative computer games such as World of Warcraft, HALO’s goal is to make all aspects of software engineering more fun, increasing developer productivity and satisfaction.

HALO represents software engineering tasks as quests and uses a storyline to bind multiple quests together – users must complete quests in order to advance the plot. Quests can either be individual, requiring a developer to work alone, or group, requiring a developer to form a team and work collaboratively towards their objective.

This approach follows a growing trend to “gamify” everyday life (that is, bring game-like qualities to it), and has been popularized by alternate reality game proponents such as Jane McGonigal.

These engaging qualities can be found in even the simplest games, from chess to tetris, and result in deep levels of player immersion. Gamification has also been studied in education, where teachers use the engaging properties of games to help students focus.

We leverage the inherently competitive-collaborative nature of software engineering in HALO by providing developers with social rewards. These social rewards harness operant conditioning – a model that rewards players for good behavior and encourages repeat behavior. Operant conditioning is a technique commonly harnessed in games to retain players.

Multi-user games typically use peer recognition as the highest reward for successful players. Simple social rewards in HALO can include titles – prefixes or suffixes for players’ names – and levels, both of which showcase players’ successes in the game world. For instance, a developer who successfully closes over 500 bugs may receive the suffix “The Bugslayer.” For completing quests, players also receive experience points that accumulate causing them to “level up” in recognition of their ongoing work. HALO is also designed to create an immersive environment that helps developers to achieve a flow state, a technique that has been found to lead to increased engagement and addiction.

Team Members


Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Graduate Students

Jon Bell, jbell [at] cs.columbia.edu
Swapneel Sheth, swapneel [at] cs.columbia.edu




Jonathan Bell, Swapneel Sheth and Gail Kaiser. A Gameful Approach to Teaching Software Testing. Kendra Cooper and Walt Scacchi (eds.), Computer Games and Software Engineering, CRC, 2015.

At SSE 2011

At GAS 2011


World of Warcraft Massive Dataset


About genSpace


geWorkbench (genomics Workbench) is a Java-based open-source platform for integrated genomics. Using a component architecture it allows individually developed plug-ins to be configured into complex bioinformatic applications. At present there are more than 70 available plug-ins supporting the visualization and analysis of gene expression and sequence data. Example use cases include:

  • loading data from local or remote data sources.
  • visualizing gene expression, molecular interaction networks, protein sequence and protein structure data in a variety of ways.
  • providing access to client- and server-side computational analysis tools such as t-test analysis, hierarchical clustering, self organizing maps, regulatory neworks reconstruction, BLAST searches, pattern/motif discovery, etc.
  • validating computational hypothesis through the integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis.

genSpace is a suite of collaboration plugins to geWorkbench aimed to support knowledge sharing among computational biologists based on popular social networking motifs.  genSpace logs all user activities to a backend server, and data mines this information to recommends tools and workflows (sequences of analysis and visualization tools) in “people like you” style.  It also supports Facebook-like friends (direct collaborators) and networks (colleagues from same lab, institution or community), presence facilities including available/away/offline and live activity feed, and a shared research notebook that documents the details of all analyses. The introduction of genSpace web services can be found here.

This research is in collaboration with the Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) on the Columbia University Health Sciences campus, which is funded by NIH and NCI.

Team Members


Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

PhD Students
Fang-Hsiang (Mike) Su, mikefhsu [at] cs.columbia.edu

Former PhD Students and MS GRAs
Jon Bell, jbell [at] cs.columbia.edu
Swapneel Sheth, swapneel [at] cs.columbia.edu
Chris Murphy, cmurphy [at] cs.columbia.edu
Nikhil Sarda, ns2847 [at] columbia.edu

Project Students
John Murphy, jvm2108@columbia.edu
Abhaar Gupta, ag3468@columbia.edu

Former project students
Yu Wang
Ami Kumar
Huimin Sun
Diana Chang
Anureet Dhillon
Gowri Kanugovi
Mayur Lodha
Koichiro Matsunaga
Lakshmi Nadig
Joshua Nankin
Cheng Niu
Gaurav Pandey
Hyuksoo Seo
Yuan Wang
Eric Schmidt
Nan Luo
Danielle Cauthen
Flavio Antonelli
Ning Yu
Jason Halpern
Evgeny Fedetov
Aditya Bir
Alison Yang


Papers, Presentations, etc.

C2B2 retreat poster and slides, May 2013
C2B2 retreat poster and slides, May 2012
DEIT 2011 paper and slides – “Towards using Cached Data Mining for Large Scale Recommender Systems”
RSSE 2010 paper and poster – “The weHelp Reference Architecture for Community-Driven Recommender Systems”
C2B2 retreat posters (1 and 2), April 2010
SSE 2010 paper and workshop presentation – “weHelp: A Reference Architecture for Social Recommender Systems”
C2B2 retreat presentation and poster, March 2009
SoSEA 2008 paper and workshop presentation – “genSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work”
C2B2 retreat presentation and poster, April 2008

genSpace wiki
geWorkbench wiki
C2B2 project management wiki

Source Code
geWorkbench repository (login required)


Contact: Fang-hsiang (Mike) Su



As the Internet has grown in popularity, security vulnerability detecting and testing are undoubtedly becoming crucial parts for commercial software, especially for web service applications. Vulnerability scanners, both commercial and open-source (i.e., SAINT, eEye, Nessus, etc.), were developed to achieve this goal. However, the absence of a well defined assessment benchmark makes the efficient evaluation of these scanners nearly impossible. With ongoing researches on new vulnerability scanners, the demand for such an assessment benchmark is urgent. We are working on developing VULCANA, a set of open-source web service applications with systematically injected vulnerabilities. The idea is that different vulnerability scanners can be used to scan the benchmark, and the percentage of detected vulnerabilities together with the resource consumption are used to provide reasonable evaluation.

In Spring 2009, we developed a prototype framework called Baseline, which is described in our tech report. The idea of BaseLine is that we tried to coach the users to pick the right Web Vulnerability Scanner by letting them set up a baseline for potential qualified scanners. We can then test the scanner with the baseline, revealing its effectiveness and efficiency in detecting the user’s most “care-about” vulnerabilities.

Brief Introduction of Baseline
Most of existing benchmarks use the scanners to scan a manually crafted website with a number of known vulnerabilities, and rate the scanners based on the percentage of successful detection. These benchmarks are only capable of judging which scanner is better in the matter of how well the scanners can detect the fixed set of vulnerabilities the benchmarks picked with static selection criteria. They suffer from drawbacks by neglecting the critical questions: Does the benchmark properly reflect the user’s security requirements; does it reflect the user’s actual deployment environment? In helping the users choose the right scanners, answering these questions is as crucial as evaluating the effectiveness and efficiency of the scanners. In this paper, we propose an approach called Baseline that addresses all of these problems: We implement a ranking system for dynamically generating the most suitable selection of weaknesses based on the user’s needs, which serves as the baseline that a qualified scanner should reach/detect. Then we pair the ranking system with a testing framework for generating test suites according to the selection of weaknesses. This framework maps a weakness into an FSM (Finite State Machine) with multiple end states that represent different types/mutations of exploitations of the weakness and each transition from state to state determined by scanner behavior, the framework then combines the FSMs of the selected weaknesses into a mimicked vulnerable website. When a scanner scans the “vulnerable” website, the transitions between the states are recorded and thus we are able to evaluate the scanner by looking at which end states were visited (effectiveness), in how much time, and over how many transitions(efficiency).

Currently we are looking at methods of measuring assorted aspects of the web vulnerability scanners. Specifically, the ability of bypassing client-side validation, the crawling coverage and the capability of scanning auto-generated pages.

Open research questions include:

  • Currently, Baseline framework uses Regular expression to determine the transition between two states. Can we extend Baseline with more sophisticated validation methods?
  • Client-side validation seems to be neglected by most (if not all) existing scanners. Are there any drawbacks for the scanners to omit them?
  • There are no existing web vulnerabilty repository, can we create one?

Team Members

Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Graduate Students
Huning Dai, hdd2210 [at] columbia.edu
Shreemanth Hosahalli, sh2959 [at] columbia.edu

Former Members
Michael Glass
Anshul Mittal


About CloudView

CloudView is a project that enables detection and diagnosis of network faults using a peer to peer architecture. Consider the following scenario. A user is trying to log into an IM server, but she is not able to. There could be a variety of reasons for the failure. Some plausible causes for this failure are the IM server is temporarily unavailable, the ISP is down, or the user’s password is not correct.

CloudView can be used to diagnose this problem, proceeding as follows: CloudView tries to contact other peers who are part of its network, which will then run probes to try to isolate the problem. Examples of probes could be trying to log in from another node, using a different username and password, and trying to ping the server. The results of these probes will be returned to the original node, and using the rule book it would try to find the cause of the problem. This entire process is automated and the group of peers runs a set of analysis tests and depending on the results on these tests, we can diagnose the problem.

Our system is based on the DYSWIS system. While DYSWIS is focused on the detection and diagnosis of network and transport level faults, CloudView is aimed towards the detection and diagnosis of faults at the application level.

In previous semesters, we have focused on the XMPP/Jabber Chat Protocol and developed a proof-of-concept implementation. We have also extended our system to the Samba (SMB) protocol.

We are looking to add more functionality to the current implementation and also extend this into other domains, which include BitTorrent, Cloud Computing, Email, Web Browsing, and Games.

Team Members

Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Graduate Students
Swapneel Sheth, swapneel [at] cs.columbia.edu

Former members
Rajat Dixit
Palak Baid
Somenath Das
Siming Sun
Jau-Yuan Chen


DYSWIS – http://www.cs.columbia.edu/irt/project/dyswis/
XMPP – http://xmpp.org/
SMB – http://samba.org/



COMPASS is a Community-driven Parallelization Advisor for Sequential Software. It provides advice to programmers while they reengineer their code for parallelism and provides a platform and an extensible framework for sharing human expertise about code parallelization. COMPASS aims to enable rapid propagation of knowledge about code parallelization in the context of the actual parallelization reengineering, and thus continue to extend the benefits of Moore’s law scaling to science and society.

Team Members

Prof. Simha Sethumadhavan, simha [at] cs.columbia.edu
Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Graduate Students
Nipun Arora, na2271 [at] columbia.edu


Official project website
Original paper from International Workshop on Multicore Software Engineering (IWMSE)


Contact: Nipun Arora