Home » Projects » Retired Projects (Page 2)

Category Archives: Retired Projects

Mutable Replay

Society is increasingly reliant on software, but deployed software contains security vulnerabilities and other bugs that can threaten privacy, property and even human lives. When a security vulnerability or critical error is discovered, a software patch is issued to attempt to fix the problem, but patches themselves can be incorrect, inadequate, and break necessarily functionality. This project investigates the full workflow for the developer to rapidly diagnose the root cause of the vulnerability or error, for the developer to test that a prospective patch indeed completely removes the defect, and for users to check the issued patch on their own configurations and workloads before adopting the patch.

This project explores the use of mutable replay to help reproduce, diagnose, and fix software bugs. A low-overhead recorder records the execution of software in case a failure or exploit occurs, allowing the developer to replay the recorded log to reproduce the problem. Mutable replay allows logs recorded with the buggy version to be replayed after the modest code changes typical of critical patches to show that patches work correctly to resolve detected problems. This project leverages semantic information readily available to the developer to conduct well-understood static and dynamic analyses to correctly transform the recorded log to enable mutable replay. The results of this research will benefit society and individuals by simplifying and hastening both generation and validation of patches, ultimately making software more reliable and secure.

Contact Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Faculty
Gail Kaiser

Graduate Students
Anthony Saeiva Narin

Former Graduate Students
Jonathan Bell
Kenny Harvey   

 

Dynamic Information Flow Analysis

We are investigating an approach to runtime information flow analysis for managed languages
that tracks metadata about data values through the execution of a program. We first considered
metadata that propagates labels representing the originating source of each data value, e.g.,
sensitive data from the address book or GPS of a mobile device that should only be accessed on a
need-to-know basis, or potentially suspect data input by end-users or external systems that
should be sanitized before including in database queries, collectively termed “taint tracking”.
We developed and made available open-source the first general purpose implementation of taint
tracking that operates with minimal performance overhead on commodity Java Virtual Machine
implementations (e.g., from Oracle and OpenJDK), by storing the derived metadata “next to” the
corresponding data values in memory, achieved via bytecode rewriting that does not require
access to source code or any changes to the underlying platform. Previous approaches required
changes to the source code, the language interpreter, the language runtime, the operating system
and/or the hardware, or added unacceptable overhead by storing the metadata separately in a
hashmap. Our system has also been applied to Android, where it required changes in 13 lines of
code, contrasted to the state of the art TaintDroid which added 32,000 lines of code. We are
currently investigating tracking the path conditions constructed during dynamic symbolic
execution of programs, which record the constraints on data values that have reached a given
point in execution (e.g., taking the true or false branch of a series of conditionals). We plan to
use the more sophisticated but slower symbolic execution version as part of several prospective
projects.

We expect to extend and use this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Faculty
Gail Kaiser

Former Graduate Students
Jonathan Bell

Links

Publications

Jonathan Bell and Gail Kaiser. Phosphor: Illuminating Dynamic Data Flow in the JVM. Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), October 2014,pp. 83-101. Artifact accepted as meeting reviewer expectations.

Jonathan Bell and Gail Kaiser. Dynamic Taint Tracking for Java with PhosphorInternational Symposium on Software Testing and Analysis (ISSTA), July 2015, pp. 409-413.

Software

Download Phosphor.

Download Knarr.

Sound Build Acceleration

Sound Build Acceleration: Our empirical studies found that the bulk of the clock time during the builds of the ~2000 largest and most popular Java open source software applications is spent running test cases, so we seek to speed up large builds by reducing testing time. This is an important problem because real-world industry builds often take many hours, so developers cannot be informed of any errors introduced by their changes while still in context – as needed for continuous integration (best practice). The consequent lack of attention to failed tests is one of the major reasons that software is deployed with so many security vulnerabilities and other severe bugs. Prior work reduces testing time by running only subsets of the test suite, chosen using various selection criteria. But this model inherently reduces failure detection, and may be unsound because remaining test cases may have dependencies on removed test cases (causing false positives and false negatives). We thought out of the box to substantially reduce measured testing time without removing any test cases at all, thus no reduction in failure detection. For example, we developed tools that use static and dynamic analyses to determine exactly which portion of the state written by previous test cases will be read by the next test case, and instrument the bytecode to just-in-time reinitialize only that dependent portion of the state, rather than restarting the JVM between separate test cases, a common industry practice. Some dependencies are unintentional, so our tools also inform developers so they can re-engineer the code to remove those dependencies. Other dependencies are necessary, because series of tests are needed to build up and check each step of complex usage scenarios; for these our tools bundle dependent test cases and distinguish independent sets of test cases to enable sound parallelization of large test suites.

We expect to use components of this tool as part of the Mutable Replay project, and are seeking new project students in tandem with that effort.

Contact Professor Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Faculty
Gail Kaiser

Former Graduate Students
Jonathan Bell

Links

Publications

Jonathan Bell, Gail Kaiser, Eric Melski and Mohan Dattatreya. Efficient Dependency Detection for Safe Java Test Acceleration. 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Aug-Sep 2015, pp. 770-781.

Jonathan Bell, Eric Melski, Gail Kaiser and Mohan Dattatreya. Accelerating Maven by Delaying Dependencies.3rd International Workshop on Release Engineering (RelEng), May 2015, p. 28.

Jonathan Bell, Eric Melski, Mohan Dattatreya and Gail Kaiser. Vroom: Faster Build Processes for Java.IEEE Software, 32(2):97-104, Mar/Apr 2015.

Jonathan Bell and Gail Kaiser. Unit Test Virtualization with VMVM. 36th International Conference on Software Engineering (ICSE), June 2014, pp. 550-561. (ACM SIGSOFT Distinguished Paper Award)

Jonathan Bell and Gail Kaiser. Unit Test Virtualization: Optimizing Testing Time. 2nd International Workshop on Release Engineering (RelEng), April 2014.

Jonathan Bell and Gail Kaiser. VMVM: Unit Test Virtualization for Java. ICSE 2014 Formal Demonstrations Track, Companion Proceedings of 36th International Conference on Software Engineering (ICSE), June 2014, pp. 576-579. Video at https://www.youtube.com/watch?v=sRpqF3rJERI.

Software

Download VmVm.

CS/SE Education

About CS/SE Education

We are exploring new techniques and approaches to improve the teaching of computer science and software engineering. Our recent projects and papers are listed below.

Contact: Swapneel Sheth (swapneel@cs.columbia.edu)

Team Members

Faculty

Prof. Gail Kaiser, kaiser [at] cs.columbia.edu

Phd Students

Swapneel Sheth, swapneel [at] cs.columbia.edu

Former PhD students

Jonathan Bell, jbell [at] cs.columbia.edu
Chris Murphy
, cmurphy [at] cs.columbia.edu

See the Software Project Management project listed on our project student advertisements page.

Links

Projects

HALO (Highly Addictive, sociaLly Optimized) Software Engineering

Retina

Backstop

Papers

Swapneel Sheth, Jonathan Bell, Gail Kaiser. A Competitive-Collaborative Approach for Introducing Software Engineering in a CS2 Class. 26th Conference on Software Engineering Education and Training (CSEE&T), San Francisco CA, pages 41-50, May 2013

Jonathan Bell, Swapneel Sheth, Gail Kaiser. Secret Ninja Testing with HALO Software Engineering. 4th International Workshop on Social Software Engineering Workshop (SSE), Szeged, Hungary, pages 43-47, September 2011

Christian Murphy, Gail Kaiser, Kristin Loveland, Sahar Hasan. Retina: Helping Students and Instructors Based on Observed Programming Activities. 40th ACM SIGCSE Technical Symposium on Computer Science Education, Chattanooga TN, pages 178-182, March 2009

Christian Murphy, Dan Phung, and Gail Kaiser. A Distance Learning Approach to Teaching eXtreme Programming. 13th Annual ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE), Madrid, Spain, pages 199-203, June 2008

C. Murphy, E. Kim, G. Kaiser, A. Cannon. Backstop: A Tool for Debugging Runtime Errors. 39th ACM SIGCSE Technical Symposium on Computer Science Education, Portland OR, pages 173-177, March 2008

Tech Reports

Kunal Swaroop Mishra, Gail Kaiser. Effectiveness of Teaching Metamorphic Testing.Technical Report CUCS-020-12, Dept. of Computer Science, Columbia University, November 2012

Fine-Grained Data Management Abstractions

We participated in developing novel technology that leverages the storage abstractions of
modern operating systems (e.g., the relational databases and object-relational mappings of
Android) to automatically detect fragments strewn across memory, files and databases that is part
of the same logical application object, such as an email and its attachments, without requiring
source code or any cooperation on the part of application developers. This substrate enabled the development of our prototype tools to check that application-level deletions in fact actually delete all the data fragments related to, say, a document or a photo; to hide (and later unhide) sensitive data, e.g., to protect business data at international border crossings; and to detect when an application collects more data than required by its functionality. In our case study, our system worked correctly on 42 out of 50 real-world applications, and lead to publication of “best practices” rules of thumb required for the approach to work on future applications — e.g., fully declare database schemas, use the database to index file storage, use standard storage libraries, which are admittedly obvious to anyone with the software engineering training that some “app” developers sadly lack.

Contact Professor Roxana Geambasu (roxana@cs.columbia.edu) for further information.

Team Members

Faculty
Roxana Geambasu
Gail Kaiser

Graduate Students
Riley Spahn

Former Graduate Students
Jonathan Bell

Links

Publications

Riley Spahn,  Jonathan Bell, Michael Z. Lee, Sravan Bhamidipati, Roxana Geambasu and Gail Kaiser. Pebbles: Fine-Grained Data Management Abstractions for Modern Operating Systems. 11th USENIX Symposium on Operating Systems Design and Implementation, October 2014, pp. 113-129.

Code Similarity

Dynamic Code Similarity: This is a multi-disciplinary project joint with Profs. Simha Sethumadhavan and Tony Jebara. “Code clones” are statically similar code fragments that usually arise via copy/paste or independently writing lookalike code; best practice removes clones (refactoring) or tracks them (e.g., to ensure bugs fixed in one clone are also fixed in others). This part of the project instead studies dynamically similar code for two different similarity models. One model is functional similarity, finding code fragments that exhibit similar input/output behavior during execution. Our other dynamic similarity model is the novel notion of behavioral similarity, which we call “code relatives”. Two or more code fragments are deemed code relatives if their executions are similar. We model this as finding similarities among the dynamic data dependency graphs representing instruction-level execution traces. We used machine learning techniques to devise a (relatively) fast inexact subgraph isomorphism algorithm to cluster these execution-level similarities. Our experiments show that both of our tools find most of the same “similar” code as the best static code clone detectors but also find many others they can’t, because the code looks very different even though functionally and/or behaviorally similar; however, dynamic detection will not necessarily find all static code clones because lookalike code involving polymorphism need not exhibit the same function/behavior. Our behavioral and functional similarity detectors do not always find the same similarities, because two or more code fragments may compute the same function using very different algorithms. Thus these kinds of techniques complement each other. Beyond the conventional applications of static code clone detection, dynamic similarity detection also addresses malware detection, program understanding, re-engineering legacy software to use modern APIs, and informing design of hardware accelerators and compiler optimizations.

Static Code Similarity: We also investigate of static similarity detection to augment our similarity detection toolkit. This work is joint with Prof. Baishakhi Ray of the University of Virginia and Prof. Jonathan Bell of George Mason University. Unlike most other static code clone research, we look for similarities at the instruction level rather than in the source code, so our techniques can work even on obfuscated executables where no source code is available and thus conventional static detectors cannot be applied. This situation arises for both malware and misappropriated intellectual property. We exploit the increasingly popular notion of “big code”, i.e., training from open-source repositories, using features that combine instruction-level call graph analysis and topic modeling (an NLP-based machine learning technique). We believe we can effectively deobfuscate most suspect code by finding similarities within a corpus consisting of known code and its obfuscated counterparts. Our approach handles control flow transformations and introduction of extraneous methods, not just method names.

Contact Gail Kaiser (kaiser@cs.columbia.edu)

Team Members

Faculty
Gail Kaiser

Former Graduate Students
Fang-Hsiang (“Mike”) Su
Jonathan Bell
Kenny Harvey   
Apoorv Patwardhan

Links

Publications

Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser and Tony Jebara. Code Relatives: Detecting Similarly Behaving Software. 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), November 2016. Artifact accepted as platinum.

Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10. (ACM SIGSOFT Distinguished Paper Award)

Fang-Hsiang Su, Jonathan Bell, and Gail Kaiser. Challenges in Behavioral Code Clone Detection (Position Paper). 10th International Workshop on Software Clones (IWSC), affiliated with IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 2016, volume 3, pp. 21-22. (People’s Choice Award for Best Position Paper)

Software

Download DyCLink from github.

Download HitoshiIO from github.

Download Code Similarity Experiments toolkit from github.

An Open Software Framework for the Emulation and Verification of Drosophila Brain Models on Multiple GPUs

We are working with Prof. Aurel Lazar’s Bionet Lab (http://www.bionet.ee.columbia.edu/) to design, implement and experimentally evaluate an open software framework called the Neurokernel that will enable the isolated and integrated emulation of fly brain model neural ciits and their connectivity patterns (e.g., sensory and locomotion systems)  and other parts of the fly’s nervous system on clusters of GPUs, and support the in vivo functional identification of neural circuits.  (Note this is NOT the same meaning of “in vivo” as PSL’s In Vivo Testing project.)

The Neurokernel will:

  1. Enable computational/systems neuroscientists to exploit new connectome data by directing emulation efforts at interoperable local processing units (LPUs), functional subdivisions of the brain that serve as its computational substrate;
  2. Capitalize on the representation of stimuli in the time domain to enable the development of novel asynchronous algorithms for processing spikes with neural circuits;
  3. Serve as an extended machine that will provide abstractions and interfaces for scalably leveraging a powerful commodity parallel computing hardware platform to study a tractable neural system;
  4. Serve as a resource allocator that will enable researchers to transparently take advantage of future improvements in this hardware platform;
  5. Enable testing of models, both by easing the detection and localization of programming errors and by operationally verifying the models’ designs against time-encoded signals to/from live fly brains in real-time;
  6. Accelerate the research community’s progress in developing new brain circuit model by facilitating the sharing and refinement of novel and/or improved models of LPUs and their constituent circuits by different  groups.

To ease its use by the neuroscience community and enable synergy with existing computational tools and packages, we are developing our software framework in Python, a high-level language that has enjoyed great popularity amongst computational neuroscientists.

As we enhance Neurokernel to model new regions of the fly brain, there may be a negative effect on previous models for other regions.  As the fly brain model(s) will be developed in iterative software development cycles, it will be imperative to ensure that each iteration re-verifies the platform and its individual LPUs against the actual fly brain neuropils.  We would like these tests on the Python code to be conducted automatically, without requiring the use of our fly interface equipment — which is manually intensive to operate.  We are constructing a tool to simulate the fly brain interface for software testing purposes that will capture the stimuli provided to the fly along with its responses.  From these sets of inputs and outputs, the tool will automatically generate test cases that recreate the same experiment without the need for repeated interfacing with the fly. This tool will also be used to automatically generate regression tests for the Neurokernel software that depend on other external factors.

Additional information is available on the Bionet website.

Contact Professor Aurel Lazar (aurel@ee.columbia.edu) for further information.

Team Members

Faculty
Aurel Lazar
Gail Kaiser

Former PSL Graduate Students
Nikhil Sarda

Metamorphic Testing

Metamorphic testing was originally developed, by others, as an approach to deriving new test
cases from an existing test suite, seeking to find additional bugs not found by the original tests.
Given a known execution function(input) produces output, the metamorphic properties of a
function (or of an entire application) enable automatic derivation of a new input’ from input such
that the expected output’ can be predicted from output. If the actual output’’ is different from
output’, then there is a flaw in the code or its documentation. We expanded metamorphic testing
in several ways, initially to apply to “non-testable programs”, where there is no test oracle; that
is, metamorphic testing can detect bugs even when we do not know whether output is correct for
input (so conventional testing techniques may not be useful). This problem arises for the
machine learning, data mining, search, simulation and optimization applications prevalent in “big
data” analysis. For example, if a machine learning program generates clusters from a set of
examples, one would expect it to produce the same clusters when the order of the input examples
is permuted; however, we have found anomalies in several widely used machine learning
libraries (e.g., Weka) where the result is different from expected when the set of input examples
is modified in some simple way. We are investigating how to extend the notion of metamorphic
properties to before and after state, beyond just input/output parameters, to find bugs that affect
the internal state but are not evident from input/output. Most recently we developed a tool for
automatically discovering candidate metamorphic properties from execution profiling that
performs better than student subjects; the state of the art is for a human domain expert to
manually define the properties, a tedious, error-prone and expensive process.

Team Members

Faculty
Gail Kaiser

Graduate Students
Fang-Hsiang (“Mike”) Su

Former Graduate Students
Chris Murphy
Jonathan Bell

Links

Publications

Fang-Hsiang Su, Jonathan Bell, Christian Murphy and Gail Kaiser. Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing. 10th IEEE/ACM International Workshop on Automation of Software Test (AST), May 2015, pp. 55-59.

Jonathan Bell, Christian Murphy and Gail Kaiser. Metamorphic Runtime Checking of Applications Without Test Oracles. Crosstalk the Journal of Defense Software Engineering, 28(2):9-13, Mar/Apr 2015.

Christian Murphy, M. S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Leon Osterweil. On Effective Testing of Health Care Simulation Software. 3rd International Workshop on Software Engineering in Health Care (SEHC), May 2011, pp. 40-47.

Xiaoyuan Xie, Joshua W. K. Ho, Christian Murphy, Gail Kaiser, Baowen Xu and Tsong Yueh Chen.  Testing and Validating Machine Learning Classifiers by Metamorphic Testing.  Journal of Systems and Software (JSS), Elsevier, 84(4):544-558, April 2011.

Christian Murphy, Kuang Shen and Gail Kaiser. Automatic System Testing of Programs without Test Oracles. International Symposium on Software Testing and Analysis (ISSTA), July 2009, pp. 189-200.

Christian Murphy, Kuang Shen and Gail Kaiser. Using JML Runtime Assertion Checking to Perform Metamorphic Testing in Applications without Test Oracles. 2nd IEEE International Conference on Software Testing, Verification and Validation (ICST), April 2009, pp. 436-445.

Christian Murphy, Gail Kaiser, Lifeng Hu and Leon Wu. Properties of Machine Learning Applications for Use in Metamorphic Testing. 20th International Conference on Software Engineering and Knowledge Engineering (SEKE), July 2008, pp. 867-872.

Christian Murphy, Gail Kaiser and Marta Arias. Parameterizing Random Test Data According to Equivalence Classes. 2nd ACM International Workshop on Random Testing (RT), November 2007, pp.38-41.

Christian Murphy, Gail Kaiser and Marta Arias. An Approach to Software Testing of Machine Learning Applications. 19th International Conference on Software Engineering and Knowledge Engineering (SEKE), July 2007, pp. 167-172.

Software

Download <a href="http://” target=”_blank”>Kabu.

In Vivo Testing

Software products released into the field typically contain residual defects that either were not detected or could not have been detected during pre-deployment testing. For many large, complex software systems, it is infeasible in terms of time and cost to reliably test all configuration options before release using unit test virtualization, test suite minimization, or any other known approach. For example, Microsoft Internet Explorer has over 19 trillion possible combinations of configuration settings. Even given infinite time and resources to test an application and all its configurations, other software on which a software product depends or with which it interacts (e.g., sensor networks, libraries, virtual machines, etc.) are often updated after the product’s release; it is impossible to test with these dependencies prior to the application’s release, because they did not exist yet. Further, as multi-processor and multi-core systems become more prevalent, multi-threaded applications that had only been tested on single- or dual-processor/core machines are more likely to reveal concurrency bugs.

We are investigating a testing methodology that we call “in vivo” testing, in which tests are continuously executed in the deployment environment. This requires a new type of test case, called in vivo tests, which are designed to run from within the executing application in the states achieved during normal end-user operation rather than in a re-initialized or artificial pre-test state. These tests focus on aspects of the program that should hold true regardless of what state the system is in, but differ from conventional assertion checking, since assertions are prohibited from introducing side-effects: in vivo tests may indeed and typically do have side-effects on the application’s in-memory state, external files, I/O, etc. but these are all “hidden” from users by cloning the executing application to run the test cases in the same kind of sandbox often aimed to address security concerns. The in vivo approach can be used for detecting concurrency, security or robustness issues, as well as conventional flaws that may not have appeared in a testing lab (the “in vitro” environment). Our most recent research concerns how to reduce the overhead of such deployment-time testing as well as automatic generation of some of the in vivo test cases from traditional pre-existing unit tests.

In Fall 2007, we developed a prototype framework called Invite, which is described in our tech report and was presented as a poster at ISSTA 2008 (a variant of this paper was presented at ICST 2009, and is available here). This implementation uses an AspectJ component to instrument selected classes in a Java application, such that each method call in those classes has some chance (configurable on a per-method basis) of executing the method’s corresponding unit test. When a test is run, Invite forks off a new process in which to run the test, and the results are logged.

We also developed a distributed version of Invite, which seeks to amortize the testing load across a community of applications; a paper was published in the student track of ICST 2008. This version currently uses only one global value for the probability of running a test, instead of one per method, however. That value is set by a central server, depending on the size of the “application community”.

In Spring 2008, we looked at various mechanisms for reducing the performance impact of Invite, e.g. by assigning tests to different cores/processors on multi-core/multi-processor machines, or by limiting the number of concurrent tests that may be run. We also looked at ways of balancing testing load across members of a community so that instances under light load pick up more of the testing. Lastly, we created a modified JDK that allows Invite to create copies of files so that in vivo tests do not alter the “real” file system.

In Fall 2008, we ported the Invite framework to C and evaluated more efficient mechanisms for injecting the instrumentation and executing the tests. We also investigated fault localization techniques, which collect data from failed program executions and attempt to discover what caused the failure.

Recently we have investigated ways to make the technique more efficient by only running tests in application states it hasn’t seen before. This cuts down on the number of redundant states that are tested, thus reducing the performance overhead. This work has potential application to domains like model checking and dynamic analysis and was presented in a workshop paper at AST 2010.

Currently we are looking at ways to apply the In Vivo approach to the domain of security testing. Specifically, we devised an approach called Configuration Fuzzing in which the In Vivo tests make slight changes to the application configuration and then check “security invariants” to see if there are any vulnerabilities that are configuration-related. This work was presented at the 2010 Workshop on Secure Software Engineering.

In 2012-2013, we are investigating techniques to efficiently isolate the state of the tests, so as to avoid the effect of the tests on external systems.

Open research questions include:

  • Can the overhead be reduced by offloading test processes to other machines? This is especially important when the application is running on a single-core machine.
  • What sorts of defects are most likely to be detected with such an approach? How can we objectively measure the approach’s effectiveness at detecting defects?
  • How can the tests be “sandboxed” so that they do not affect external entities like databases? We currently assure that there are no changes to the in-process memory or to the file system, but what about external systems?

This is an older project, where we recently revived the main technique for our more recent work on dynamic code similarity.

Contact Mike Su (mikefhsu@su.columbia.edu) for further information about the recent effort.

Team Members

Faculty
Gail Kaiser

Graduate Students
Fang-Hsiang (“Mike”) Su

Former Graduate Students
Chris Murphy
Jonathan Bell
Matt Chu
Waseem Ilahi
Moses Vaughan

Former Undergraduate Students
Ian Vo

Links

Publications
Fang-Hsiang Su, Jonathan Bell, Gail Kaiser and Simha Sethumadhavan. Identifying Functionally Similar Code in Complex Codebases. 24th IEEE International Conference on Program Comprehension (ICPC), May 2016, pp. 1-10.

Christian Murphy, Moses Vaughan, Waseem Ilahi and Gail Kaiser. Automatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysis. 5th International Workshop on the Automation of Software Test, May 2010, pp. 16-23.
Christian Murphy, Gail Kaiser, Ian Vo and Matt Chu. Quality Assurance of Software Applications Using the In Vivo Testing Approach. 2nd IEEE International Conference on Software Testing, Verification and Validation (ICST), April 2009, pp. 111-120.
Matt Chu, Christian Murphy and Gail Kaiser. Distributed In Vivo Testing of Software Applications. 1st IEEE International Conference on Software Testing, Verification, and Validation, April 2008, pp. 509-512.

Software
Invite

Societal Computing

Societal Computing research is concerned with the impact of computational tradeoffs on societal issues and focuses on aspects of computer science that address significant issues and concerns facing the society as a whole such as Privacy, Climate Change, Green Computing, Sustainability, and Cultural Differences. In particular, Societal Computing research will focus on the research challenges that arise due to the tradeoffs among these areas.

As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing,
another major area of societal concern.

Societal Computing focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole.

We studied how software developers vs. end-users perceive data privacy requirements (e.g.,
Facebook), and which concrete measures would mitigate privacy concerns. We conducted a
survey with closed and open questions and collected over 400 valid responses. We found that
end-users often imagine that imposing privacy laws and policies is sufficient, whereas
developers clearly prefer technical measures; it is not terribly surprising that developers familiar
with how software works do not think merely passing a law will be effective. We also found that
both users and developers from Europe and Asia/Pacific are much more concerned about the
possibility of privacy breaches that those from North America.

Team Members

Faculty
Gail Kaiser

Former Graduate Students
Swapneel Sheth

Links

Publications

Swapneel Sheth, Gail Kaiser and Walid Maalej. Us and Them — A Study of Privacy Requirements Across North America, Asia, and Europe. 36th International Conference on Software Engineering (ICSE), pp. 859-870, June 2014.

Swapneel Sheth and Gail Kaiser. The Tradeoffs of Societal Computing. Onward!: ACM Symposium on New Ideas in Programming and Reflections on Software, October 2011, pp. 149-156.