Code Deobfuscation: Intertwining Dynamic Static and Symbolic Approaches

Abstract

Over the years, obfuscation has taken a significant place in the software protection field. The term generally embraces any mean aiming at slowing down the analysis of a program, either by an analyst or an automated algorithm. As such, it has gained a certain popularity in the video-game industry. Unfortunately, it has also gained popularity in the malware underground ecosystem, leading to the need of deobfuscation techniques. The only property that should be preserved by obfuscation is the semantic of the program, i.e. its behavior. Hence, in the broadest sense, deobfuscation is the mean to make the behavior of the malware more intelligible, taken as a fact that recovering the original program is impossible in the general case. The first step toward understanding a binary program is to disassemble it in order to obtain a good representation of its Control-Flow Graph (CFG). As a consequence, obfuscation techniques (also) aim at fooling existing disassembly tools and techniques. Standard obfuscations usually target either static analysis (CFG flattening, self-modification, etc.) or dynamic analysis (anti-debugging tricks, VM detection or runtime monitoring). Thus, taking advantages of different approaches might become essential to handle obfuscated codes. While static analysis cover the whole program but is quickly fooled by obfuscations such as self-modification, dynamic analysis helps getting a real execution trace of the program but is limited to one or a few execution paths. In between, dynamic symbolic execution (DSE) -- aka concolic execution -- helps covering more new paths in the program using symbolic values and automatic solvers. This technique has already been fruitfully applied for various purposes such as test generation [1], vulnerability discovery [2] and more recently deobfuscation [3]. Unfortunately, the main problem is that it hardly scales on large obfuscated codes. We show in this talk how to combine in a successful way several disassembly techniques -- namely dynamic analysis, several state-of-the-art variants of symbolic execution and static analysis, in order to help recovering a more precise CFG of the obfuscated code under analysis. Especially, dynamic analysis brings robustness to tricky obfuscations such as self-modification, variants of symbolic executions can answer both feasibility and infeasibility queries arising during the deobfuscation process, and standard static analysis can be guided in a safe way to extend the disassembly. These analyses are implemented in the open-source framework BINSEC. They are articulated around three components: * BINSEC/SE: the core dynamic symbolic engine * Pinsec: a Pin-based dynamic instrumenter * IDASec: an IDA plugin allowing to lift analysis data into IDA, making them straightly usable for the reverse-engineer. This talk will explain in detail the method and how it is implemented in BINSEC. Practical examples will focus on the detection of opaque predicates and call stack tampering, with case-studies based on Tigress, o-llvm and several commercial packers. The end goal is to empower the reverse-engineering by giving the analyst semantic information about the program such as obfuscation in order to hold all the cards in hand for a better and deeper understanding of the binary being analyzed. [1] {BHUSA2014} Contemporary Automatic Program Analysis. Julian Cohen [2] {RECON14,HITB14, Shakacon2014} Fuzzing and Patch Analysis SAGEly Advice. Richard Johnson [3] {CCS15} Symbolic Execution of Obfuscated Code. Babak Yadegari, Saumya Debray

Abstract

Slides