Home » Projects (Page 3)

Category Archives: Projects


About This Project

In this project, we sought to learn about introductory-level students’ programming habits by observing their behavior when they use an IDE such as Eclipse. We do this by capturing such data like compilation errors, amount of time spent on an assignment, etc., then reporting the data back to a central repository where it can be mined and analyzed. This will help us create reports for the instructor, and also allow us to create ad hoc social networks of students who who have similar programming styles and habits. We also want the system to be able to provide helpful hints to the students, based on their programming styles. We believe that this will enrich the students’ experience and make them better programmers.

In Fall 2007, we built the basic infrastructure for capturing compilation errors and storing them in a database, as well as a prototype UI for instructors’ reports, and a IM-based user interface with which students can “chat”.

In Spring 2008, we collected data from some students in COMS W1004, added new reports and analysis to the instructor’s UI, began the creation of ad hoc user communities (social networks), and created a “help” feature that suggests ways that students can improve their code.

In Summer 2008, we analyzed the data that we collected and tried to determine any correlations between when students start their homework, how much time they spend on it, how many errors they make, what time of day they work on it, and what grades they receive. We also developed the student-view UI and implemented the real-time recommendations. Last, we wrote a paper, which was presented at SIGCSE 2009.

Unfortunately the Retina project is no longer active but if you are interested in working on it and reviving it, please contact us.

Team Members


Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Diana Chang
Aaron Fernandes
Michelle Forman
Sahar Hasan
Tian He
Shreya Kedia
Henry Lau
Tina Loveland
Ben Monnin
Chris Murphy


Retina paper from SIGCSE 2009

Demo Videos
Instructor View (AVI, MPG)
Student View (AVI, MPG)
Real-time Recommendation (AVI, MPG)

Related tools

Microsoft JDBC driver

SQL tutorial
JDBC tutorial (focuses on Oracle but a good starting point)
Microsoft JDBC tutorial (warning: there are some errors in the doc)
JDBC tutorial (this one’s actually really good)
XML and DOM tutorial
Java socket tutorial


Kheiron was developed as a toolkit for performing runtime adaptations in software systems. Our original goal was to create a tool that could be used to dynamically retro-fit self-healing capabilities onto existing/legacy systems transparently and with low overhead. Kheiron manipulates compiled C programs running in an unmanaged execution environment (ELF binaries on Linux x86) as well as programs running in managed execution environments e.g. Microsoft’s Common Language Runtime and Sun Microsystems’ Java Virtual Machine. We currently use Kheiron to build fault-injection tools, which we use in our RAS-benchmarking efforts described below.


About Backstop

In this project, we sought to create tools to help novice Java programmers comprehend some of the complicated error messages provided by the Java Virtual Machine. Whereas there has been much work in creating easy-to-understand compiler messages, little work has been done in the area of runtime errors. When a Java program produces an uncaught exception, the result is a stacktrace, which can be difficult for a novice programmer to understand. Backstop has been designed to produce a simpler error message that attempts to explain the cause of the problem, and how to fix it.

Our paper on Backstop has been published in the proceedings of SIGCSE 2008 and is available here.

Unfortunately the Backstop project is no longer active but if you are interested in working on it and reviving it, please contact us.

Team Members

Prof. Gail Kaiser, kaiser [at] cs.columbia.edu
Chris Murphy
Eunhee Kim


Backstop paper from SIGCSE 2008
Tech report, with an appendix

Related Work
Jim Etheredge’s CMeRun paper


Backstop is released under the GNU General Public License

Backstop v1.0.2 (3/27/08)



Crunch is a web proxy, usable with essentially all web browsers, that performs content extraction (or clutter reduction) from HTML web pages. Crunch includes a flexible plug-in API so that various heuristics can be integrated to act as filters, collectively, to remove non-content and perform content extraction.

This proxy has evolved from a program where individual settings had to be tweaked by hand by the end user, to an extraction system that is designed to adapt to the user’s workflow and needs, classifying web pages based on genre and utilizing this information to extract content in similar manners from similar sites. It reduces human involvement in applying heuristic settings for websites and instead tries to automate the job by detecting and utilizing the content genre of a given website.

One of the major goals of Crunch is to be able to make web pages more accessible to people with disabilities and we believed that preprocessing web pages with Crunch would make inaccessible web pages more accessible.













Suhit Gupta, Gail Kaiser, “CRUNCH – Web-based Collaboration for Persons with Disabilities”, W3C Web Accessibility Initiative, Teleconference on Making Collaboration Technologies Accessible for Persons with Disabilities, Apr 2003.

Suhit Gupta, Gail Kaiser, David Neistadt, Peter Grimm “DOM-based Content Extraction of HTML Documents” WWW2003

Suhit Gupta; Gail E Kaiser, Peter Grimm, Michael F Chiang, Justin Starren, “Automating Content Extraction of HTML Documents” World Wide Web Journal, January 2004

Michael F. Chiang, Roy G. Cole, Suhit Gupta, Gail E Kaiser, Justin Starren, “World Wide Web Accessibility by Visually Disabled Patients: Problems and Solutions”, Submitted to the Journal of Opthalmology, January 2004

Suhit Gupta; Gail E Kaiser, Salvatore Stolfo, “Extracting Context To Improve Accuracy For HTML Content Extraction”, Poster at the World Wide Web Conference 2005

Suhit Gupta, Gail E Kaiser, Salvatore Stolfo, Hila Becker, Genre Classification of Websites Using Search Engine Snippets for Content Extraction”, Submitted to SIGIR 2005

Suhit Gupta, Gail Kaiser, “Extracting content from accessible webpages”, Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A), May 2005


XUES (XML Universal Event Service) is PSL’s sophisticated event manipulation system.  XUES is supported under the DARPA DASADA project, and is part of the KX architecture.  XUES consists of two parts:

Event Packager (or EP), which serves as a data aggregator, flight recorder, and event transformer for PSL’s KX system.  Features include support for extensible event formats and conversion (currently supports Siena, Elvin, and XML); event recording and playback to/from SQL as well as memory; and support for time synchronization and simple event rewriting.

Event Distiller (or ED), which is a flexible event pattern-recognition/gauge architecture.  Features include multiple-event pattern recognitions, including “success” and “failure” situations; timebound validation (e.g., events received in order within specified timebounds); wildcards to support event patterns.  ED currently communicates either via Siena or direct method calls.

Further information on XUES is available:



Decentralized Information Spaces for Composition and Unification of Services

The DISCUS project focuses on forming temporary alliances among existing legacy systems that may span organizational boundaries, to rapidly deal with a unique and temporary problem. This integrated suite of tools, termed a Summit, will help operational teams to achieve quick understanding and resolution of the problem at hand.

A summit is formed by composing services from different Service Spaces, where each service space is an autonomous collection of services allowing selective access to its services. The extent of access allowed by a service space to external entities is based on their credentials and the nature of the problem, thus creating a dynamic trust model. A Treaty, representing a contract, is then formed amongst the service spaces of these participating services.

Each service space enforces normative interaction between the ‘enlisted’ services of a summit by intercepting and verifying every operation with the relevant treaty. Once the services within a summit have accomplished their mission, the summit can be dissolved.

DASADA Demo Days (7/02) poster
DASADA Demo Days (7/02) flyer
Overview slides

Kinesthetics eXtreme

Our current mission is to develop a feedback/feedforward infrastructure for run-time monitoring and repair/reconfiguration of component-based distributed systems.

The DARPA Dynamic Assembly of Systems for Adaptability, Dependability, and Assurance (DASADA) Program involves research into software probes and corresponding measurement gauges.  The program is developing a standard for the structure of probe events that will be processed by gauges.  Columbia University’s Programming Systems Lab, OBJS, BBN, and other DASADA participants have developed an initial version of the proposed schemas.

An example message using the schema is located here.  The schemas are intended to function as SOAP blocks.  A SOAP message has a header section, a body section, and an optional faults section.  Each section can contain one or more blocks.  Further information on SOAP can be found at http://www.w3.org/2000/xp/.  Current usage is to put the context block in the header and one or more content blocks in the body.  There are currently two content blocks defined, one very low level, one very high level.  It is assumed that most users will want to define their own content blocks as well.

The low-level probe content block contains information about a particular function/method call, including name, parameters, value of “this,” return value, and exception information.  The high-level probe content block identifies an architectural mutation, involving components, connectors, and so forth.  A possible usage of this probe format is for the probe to generate low-level events, which are augmented by successive processing with higher-level information.


MEET is Columbia’s Multiply Extensible Event Transport. It aims to provide an easy-to-use publish-subscribe infrastructure with the following features:

  • Scalability: 106 – 109 publishers and subscribers.
  • Survivability: arbitrary connection topology, with rapid switchover to alternate routes. Redundant distributed storage of routing information.
  • Extensibility:
    • new datatypes and operations can be defined for event filters. An example is the XML processing filter which allows XPath expressions to be used as predicates over XML fields. A subset of operations will be realtime compatible.
    • routing policies are extensible. Event publishing / multicast is an open problem, and specific algorithms may be optimal for particular situations.


Worklets micro-workflow mobile code adapts computation to component context

Workflakes integrates Cougaar (BBN) macro-workflow for community coordination

Process-aware system repair: KX “continual coordination”


A joint project with Profs. Gail KaiserJohn Kender and Jason Nieh.

Flyer (6/01)