Android applications are usually obfuscated before release,
making it difficult to analyze them for malware presence or
intellectual property violations. Obfuscators might hide the
true intent of code by renaming variables and/or modifying
program structures. It is challenging to search for executables
relevant to an obfuscated application for developers to analyze
efficiently. Prior approaches toward obfuscation resilient
search have relied on certain structural parts of apps remaining
as landmarks, un-touched by obfuscation. For instance,
some prior approaches have assumed that the structural relationships
between identifiers are not broken by obfuscators;
others have assumed that control flow graphs maintain their
structures. Both approaches can be easily defeated by a motivated
obfuscator. We present a new approach, MACNETO,
to search for programs relevant to obfuscated executables
leveraging deep learning and principal features on instructions.
MACNETO makes few assumptions about the kinds of
modifications that an obfuscator might perform. We show
that it has high search precision for executables obfuscated
by a state-of-the-art obfuscator that changes control flow. Further,
we also demonstrate the potential of MACNETO to help
developers understand executables, where MACNETO infers
keywords (which are from relevant un-obfuscated programs)
for obfuscated executables.
@inproceedings{Su:2018:ORS:3211346.3211352, author = {Su, Fang-Hsiang and Bell, Jonathan and Kaiser, Gail and Ray, Baishakhi}, title = {{Obfuscation Resilient Search Through Executable Classification}}, booktitle = {{Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL)}}, series = {MAPL 2018}, year = {2018}, isbn = {978-1-4503-5834-7}, location = {Philadelphia, PA, USA}, pages = {20--30}, numpages = {11}, url = {http://doi.acm.org/10.1145/3211346.3211352}, doi = {10.1145/3211346.3211352}, acmid = {3211352}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {bytecode analysis, bytecode search, deep learning, executable search, obfuscation resilience}, }