The best way to do it - is to take existing compiler and modify it. Flexible and good structured LLVM already started to be de facto standard for developing multi-platform compiler with new optimization features. So in that moment we wanted to use LLVM byte code optimization modules for code replication.
Our thoughts stayed thoughts, and I was pleasantly surprised when guys from Switzerland University presented o-llvm. Guys made a great job and currently they are supporting Instruction Substitution, Bogus control flow and Control Flow Flattering. Choice was not casual, and in this blog entry I'll discuss about simplification problem complexity and will show o-LLVM how we can use LLVM compiler for resolving some o- LLVM transformations.
Let's start with review articles related to functional equivalency.
First of all I want to point out [1], where S. Chow, V. Zakharov provided steps for Control flow flattering transformations and base on LBTM proved that minimization and redundancy resolving is PSPACE-hard. So, for example solving this problem can be equivalent mathematically to always complete Nintendo game (from here).
If we would talk only about resolving opaque predicates, or reformulate - determining execution branch statically, then it's a NP-hard problem. In [2] Chenxi Wang and others showed it base on 3-SAT problem and suggested 2 possible approximation methods: Brute-force search method (executing all blocks) and Alias-detection approximation methods (opaque predicates matching). Quarkslab team successful simplified transformation using miasm framework base on pattern matching for predicates and symbolic execution. Referenced attempt also complained that "there are always particular cases depending on the target binary" which, of course will make impossible automatic simplification system.
Briefly let's look thought complexity of other obfuscation techniques mentioned in obfuscation taxonomy. The easiest dead code elimination (DCE), as you might know, takes O(n^2), liveness analysis O(n^3)(DCE + O(|n|)), SAT predicates that could be solved with PPSZ O(2^{0.386n}), code duplication elimination is EXPTIME, restructured array can be assessed like SSA + sort = n^2*log(n).
I want to mention also Podlovchenko and his articles about using Algebraic models for program equivalency. Depending on the model belongs to transformation, complexity varies from logarithmic to NP.
I think now idea is clear. For a most transformation techniques simplification is possible. For a moment there are no universal methods of simplification, and only some experiments with it were made. In my blog entry I want to show how LLVM could help us with simplification.
Our steps are the following:
- compile code to LLVM-IR with o-LLVM obfuscation enabled(use omit-llvm flag)
- optimize obfuscated LLVM-IR with released LLVM version.
- generate graph for both IR
For experiments we will use code from Quarkslab experiments.
unsigned int target_function(unsigned int n) { unsigned int mod = n % 4; unsigned int result = 0; if (mod == 0) result = (n | 0xBAAAD0BF) * (2 ^ n); else if (mod == 1) result = (n & 0xBAAAD0BF) * (3 + n); else if (mod == 2) result = (n ^ 0xBAAAD0BF) * (4 | n); else result = (n + 0xBAAAD0BF) * (5 & n); return result; }
Here is LLVM-IR listing of function compiled with O0.
1. Let's start from Instruction Substitution, as from very basic technology.
As you can see, LLVM correctly resolved Substitution techniques for 3 branches. I assume that ideally, last branch should also be optimized. Running LLVM optimization to see the exact behavior I'll show in later posts.
2. The next transformation to analyze is Bogus CFG
LLVM removed 3 blocks, and optimized few of them. The third and last (TF) block from bottom looks weird and FALSE connecting to them. I think with a one more backward iteration LLVM could simplify those redundant FALSE brunches.
3. The last transformation I want to test is Flattering CFG
State machine created on flattering transformation was kept by LLVM. The only optimization was made is combining code from different states. Optimized 21 - 28 entries are the same, and I'm wondering why LLVM still kept them. If those blocks will be removed, code will be pretty the same with optimized Instruction Substitution.
LLVM has a good architecture and allows developer/researcher or malware writer to modify code in any direction he wants. For the people who want to get deep into o-LLVM I highly recommend to read Getting started with LLVM first.
Experiments showed us, that LLVM already can simplify some easy transformation and with modification for sure will be able to resolve complex stuff. But it's already cat and mouse game. The good feedback after my tests is that's already possible to create replication engine with LLVM - O-LLVM and mc-sema as a bin to LLVM translation engine.
In next blog posts I wanted to do some more experiments with [Obfuscation-] LLVM and mc-sema.
Read more...