Home » Publications

Category Archives: Publications

Binary Quilting to Generate Patched Executables without Compilation

Anthony Saieva and Gail Kaiser. Binary Quilting to Generate Patched Executables without Compilation. ACM Workshop on Forming an Ecosystem Around Software Transformation (FEAST), Virtual, November 2020, pp. 3-8. https://doi.org/10.1145/3411502.3418424

When applying patches, or dealing with legacy software, users are often reluctant to change the production executables for fear of unwanted side effects. This results in many active systems running vulnerable or buggy code even though the problems have already been identified and resolved by developers. Furthermore when dealing with old or proprietary software, users can’t view or compile source code so any attempts to change the application after distribution requires binary level manipulation. We present a new technique we call binary quilting that allows users to apply the designated minimum patch that preserves core semantics without fear of unwanted side effects introduced either by the build process or by additional code changes. Unlike hot patching, binary quilting is a one-time procedure that creates an entirely new reusable binary. Our case studies show the efficacy of this technique on real software in real patching scenarios.

@inproceedings{10.1145/3411502.3418424,
author = {Saieva, Anthony and Kaiser, Gail},
title = {{Binary Quilting to Generate Patched Executables without Compilation}},
year = {2020},
month = {November},
url = {https://doi.org/10.1145/3411502.3418424},
doi = {10.1145/3411502.3418424},
booktitle = {ACM Workshop on Forming an Ecosystem Around Software Transformation (FEAST)},
pages = {3–-8},
location = {Virtual},
}

Ad hoc Test Generation Through Binary Rewriting

Anthony Saieva, Shirish Singh and Gail Kaiser. Ad hoc Test Generation Through Binary Rewriting. 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), Virtual, September 2020, pp. 115-126. https://doi.org/10.1109/SCAM51674.2020.00018.

Software available at https://github.com/Programming-Systems-Lab/ATTUNE.

When a security vulnerability or other critical bug is not detected by the developers’ test suite, and is discovered post-deployment, developers must quickly devise a new test that reproduces the buggy behavior. Then the developers need to test whether their candidate patch indeed fixes the bug, without breaking other functionality, while racing to deploy before attackers pounce on exposed user installations. This can be challenging when factors in a specific user environment triggered the bug. If enabled, however, record-replay technology faithfully replays the execution in the developer environment as if the program were executing in that user environment under the same conditions as the bug manifested. This includes intermediate program states dependent on system calls, memory layout, etc. as well as any externally-visible behavior. Many modern record-replay tools integrate interactive debuggers, to help locate the root cause, but don’t help the developers test whether their patch indeed eliminates the bug under those same conditions. In particular, modern record-replay tools that reproduce intermediate program state cannot replay recordings made with one version of a program using a different version of the program where the differences affect program state. This work builds on record-replay and binary rewriting to automatically generate and run targeted tests for candidate patches significantly faster and more efficiently than traditional test suite generation techniques like symbolic execution. These tests reflect the arbitrary (ad hoc) user and system circumstances that uncovered the bug, enabling developers to check whether a patch indeed fixes that bug. The tests essentially replay recordings made with one version of a program using a different version of the program, even when the the differences impact program state, by manipulating both the binary executable and the recorded log to result in an execution consistent with what would have happened had the the patched version executed in the user environment under the same conditions where the bug manifested with the original version. Our approach also enables users to make new recordings of their own workloads with the original version of the program, and automatically generate and run the corresponding ad hoc tests on the patched version, to validate that the patch does not break functionality they rely on.

@INPROCEEDINGS{9252025,
author={Anthony {Saieva} and Shirish {Singh} and Gail {Kaiser}},
booktitle={IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)},
title={{Ad hoc Test Generation Through Binary Rewriting}},
month = {September},
year={2020},
location = {Virtual},
volume={},
number={},
pages={115–126},
url = {https://doi.org/10.1109/SCAM51674.2020.00018},
}

Replay without Recording of Production Bugs for Service Oriented Applications

Nipun Arora, Jonathan Bell, Franjo Ivančić, Gail Kaiser and Baishakhi Ray. Replay without Recording of Production Bugs for Service Oriented Applications. 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), Montpellier, France, September 2018, pp. 452-463. https://doi.org/10.1145/3238147.3238186.

Software available at https://github.com/Programming-Systems-Lab/Parikshan. The software is not maintained.

Short time-to-localize and time-to-fix for production bugs is extremely important for any 24×7 service-oriented application (SOA). Debugging buggy behavior in deployed applications is hard, as it requires careful reproduction of a similar environment and workload. Prior approaches for automatically reproducing production failures do not scale to large SOA systems. Our key insight is that for many failures in SOA systems (e.g., many semantic and performance bugs), a failure can automatically be reproduced solely by relaying network packets to replicas of suspect services, an insight that we validated through a manual study of 16 real bugs across five different systems. This paper presents Parikshan, an application monitoring framework that leverages user-space virtualization and network proxy technologies to provide a sandbox “debug” environment. In this “debug” environment, developers are free to attach debuggers and analysis tools without impacting performance or correctness of the production environment. In comparison to existing monitoring solutions that can slow down production applications, Parikshan allows application monitoring at significantly lower overhead.

@inproceedings{Arora:2018:RWR:3238147.3238186,
 author = {Arora, Nipun and Bell, Jonathan and Ivan\v{c}i\'{c}, Franjo and Kaiser, Gail and Ray, Baishakhi},
 title = {Replay Without Recording of Production Bugs for Service Oriented Applications},
 booktitle = {Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering},
 series = {ASE 2018},
 year = {2018},
 isbn = {978-1-4503-5937-5},
 location = {Montpellier, France},
 pages = {452--463},
 numpages = {12},
 url = {http://doi.acm.org/10.1145/3238147.3238186},
 doi = {10.1145/3238147.3238186},
 acmid = {3238186},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Fault reproduction, live debugging},
} 

Obfuscation Resilient Search through Executable Classification

Android applications are usually obfuscated before release,
making it difficult to analyze them for malware presence or
intellectual property violations. Obfuscators might hide the
true intent of code by renaming variables and/or modifying
program structures. It is challenging to search for executables
relevant to an obfuscated application for developers to analyze
efficiently. Prior approaches toward obfuscation resilient
search have relied on certain structural parts of apps remaining
as landmarks, un-touched by obfuscation. For instance,
some prior approaches have assumed that the structural relationships
between identifiers are not broken by obfuscators;
others have assumed that control flow graphs maintain their
structures. Both approaches can be easily defeated by a motivated
obfuscator. We present a new approach, MACNETO,
to search for programs relevant to obfuscated executables
leveraging deep learning and principal features on instructions.
MACNETO makes few assumptions about the kinds of
modifications that an obfuscator might perform. We show
that it has high search precision for executables obfuscated
by a state-of-the-art obfuscator that changes control flow. Further,
we also demonstrate the potential of MACNETO to help
developers understand executables, where MACNETO infers
keywords (which are from relevant un-obfuscated programs)
for obfuscated executables.

link

@inproceedings{Su:2018:ORS:3211346.3211352,
 author = {Su, Fang-Hsiang and Bell, Jonathan and Kaiser, Gail and Ray, Baishakhi},
 title = {{Obfuscation Resilient Search Through Executable Classification}},
 booktitle = {{Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL)}},
 series = {MAPL 2018},
 year = {2018},
 isbn = {978-1-4503-5834-7},
 location = {Philadelphia, PA, USA},
 pages = {20--30},
 numpages = {11},
 url = {http://doi.acm.org/10.1145/3211346.3211352},
 doi = {10.1145/3211346.3211352},
 acmid = {3211352},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {bytecode analysis, bytecode search, deep learning, executable search, obfuscation resilience},
} 

Code Relatives: Detecting Similarly Behaving Software


@inproceedings{Su:2016:CRD:2950290.2950321,
author = {Su, Fang-Hsiang and Bell, Jonathan and Harvey, Kenneth and Sethumadhavan, Simha and Kaiser, Gail and Jebara, Tony},
title = “{Code Relatives: Detecting Similarly Behaving Software}”,
booktitle = “{24th ACM SIGSOFT International Symposium on Foundations of Software Engineering}”,
series = {FSE 2016},
year = {2016},
isbn = {978-1-4503-4218-6},
location = {Seattle, WA, USA},
pages = {702–714},
numpages = {13},
url = {http://doi.acm.org/10.1145/2950290.2950321},
doi = {10.1145/2950290.2950321},
acmid = {2950321},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Code relatives, code clones, link analysis, runtime behavior, subgraph matching},
note = “Artifact accepted as platinum.”
}

Identifying Functionally Similar Code in Complex Codebases

@inproceedings{hitoshiio,
author = “Fang-Hsiang Su and Jonathan Bell and Gail Kaiser and Simha Sethumadhavan”,
title = “(Identifying Functionally Similar Code in Complex Codebases}”,
booktitle = “{24th IEEE International Conference on Program Comprehension (ICPC)}”,
month = “May”,
year = “2016”,
pages = “1–10”,
url = “http://dx.doi.org/10.1109/ICPC.2016.7503720”,
note = “ACM SIGSOFT Distinguished Paper Award”
}

Challenges in Behavioral Code Clone Detection

@inproceedings{CodeRelatives.position,
author = “Fang-Hsiang Su and Jonathan Bell and Gail Kaiser”,
title = “{Challenges in Behavioral Code Clone Detection (Position Paper)}”,
booktitle = “{10th International Workshop on Software Clones (IWSC), affiliated with IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)}”,
month = “March”,
year = “2016”,
volume = “3”,
pages = “21–22”,
url = “http://dx.doi.org/10.1109/SANER.2016.75”,
note = “People’s Choice Award for Best Position Paper.”
}

Efficient Dependency Detection for Safe Java Test Acceleration

@inproceedings{Bell:2015:EDD:2786805.2786823,
author = {Bell, Jonathan and Kaiser, Gail and Melski, Eric and Dattatreya, Mohan},
title = “{Efficient Dependency Detection for Safe Java Test Acceleration}”,
booktitle = “{2015 10th Joint Meeting on Foundations of Software Engineering}”,
series = {ESEC/FSE 2015},
year = {2015},
isbn = {978-1-4503-3675-8},
location = {Bergamo, Italy},
pages = {770–781},
numpages = {12},
url = {http://doi.acm.org/10.1145/2786805.2786823},
doi = {10.1145/2786805.2786823},
acmid = {2786823},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Test dependence, detection algorithms, empirical studies},
}

Dynamic Taint Tracking for Java with Phosphor

@inproceedings{Bell:2015:DTT:2771783.2784768,
author = {Bell, Jonathan and Kaiser, Gail},
title = {Dynamic Taint Tracking for Java with Phosphor (Demo)},
booktitle = {Proceedings of the 2015 International Symposium on Software Testing and Analysis},
series = {ISSTA 2015},
year = {2015},
isbn = {978-1-4503-3620-8},
location = {Baltimore, MD, USA},
pages = {409–413},
numpages = {5},
url = {http://doi.acm.org/10.1145/2771783.2784768},
doi = {10.1145/2771783.2784768},
acmid = {2784768},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Dataflow Analysis, Taint Tracking},
}

Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing

@inproceedings{Su:2015:DIL:2819261.2819279,
author = {Su, Fang-Hsiang and Bell, Jonathan and Murphy, Christian and Kaiser, Gail},
title = {Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing},
booktitle = {Proceedings of the 10th International Workshop on Automation of Software Test},
series = {AST ’15},
year = {2015},
month = {May},
location = {Florence, Italy},
pages = {55–59},
numpages = {5},
url = {http://dl.acm.org/citation.cfm?id=2819261.2819279},
acmid = {2819279},
publisher = {IEEE Press},
address = {Piscataway, NJ, USA},
}