PicoCTF – Obfuscate

Hi guys 🙂

In this post I will try to explain how I solved a challenge called obfuscate. It’s a crack-me challenge, which, as suggested by its name, has few obfuscation tricks to make static analysis harder. The binary is a small stripped 32-bit ELF executable.

Let’s take a look at it:

pico4180@shell:/home/obfuscate$ ./obfuscate
Password: this_is_my_password
Incorrect!pico4180@shell:/home/obfuscate$

Well, it doesn’t take any argument but read the standard input for getting the password.

1) Finding the main fuction

Since the binary is stripped, all symbols from the object file have been discarded. Thus, setting a breakpoint on the main function within gdb will be useless:

(gdb) b main
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n

However, we can learn with readelf or gdb that the entry point of our object file is at 0x80484cc. If we look at this address with IDA, we can see the code which it set-up the stack for calling __libc_start_main:

IDA recognizes the call to this function and identify the first parameter as the address of the main function which is 0x8048420. We can start the reverse-engineering from this address 🙂

2) Reversing the main function

Right after the beginning of the main function, we can see the first obfuscation. It is an overlapping instruction, also known as a “jump in the middle”. It is located in the text segment at 0x804843a:


(gdb) disas /r 0x804843a,0x8048440
Dump of assembler code from 0x804843a to 0x8048440:
0x0804843a: eb ff jmp 0x804843b
0x0804843c: c0 48 c7 44 ror BYTE PTR [eax-0x39],0x44
(gdb)

A jump short instruction (2 bytes long), is a jump relative to the next instruction. It is composed of an opcode (0xeb) and of an operand which defines how the instruction pointer should be moved (from -128 to +127). Here, the operand is 0xff which means move backward the instruction pointer from 1 byte (0xff is -1 considered as a signed byte). Since the first instruction after the jump is at 0x0804843c, this will jump to 0x0804843b. This is in the middle of the jmp instruction. That means that the operand will be treated as an opcode of a new instruction… A standard compiler, without obfuscation goals, should never produce instructions like that.

You can see that both gdb and IDA are lost since they try to disassemble instructions which are never going to be executed (the ror instruction). IDA allows on-the-fly byte code patching. This can be used to remove easily this overlapping instruction: replace the jmp opcode (0xeb) by a nop opcode (0x90) and force the opcode 0xc7 (.text:0804843E) to be displayed as code.

This allows IDA to apply its disassembly algorithm again, and we can see a little clearly what is going on:

The printf() function is used to ask the user an input password and the getline() function to read it from the standard input. Here is the getline() prototype:

  • The password (lineptr) will by dynamically allocated by the function and its address will be stored at [esp+0x1c]
  • The size of the allocated buffer (n) will be stored at [esp+0x18]
  • The input stream will by stdin (global pointer included by stdio.h)

3) Understanding the jump table mechanism

The next interesting part is the use of a jump table to check the input password.
The ordinal value of each character (-10) is used as an offset within the jump-table.
Each entry of this table contains an address which points to a set of instruction, used to test if the current character is at the right offset. As you can see, the jump table leads to a rather complicated flow graph:

The jump is performed at the address 0x8048603. Let’s put a breakpoint, and try the string “abcd” as an input password.

(gdb) b *0x08048603
Breakpoint 1 at 0x8048603
(gdb) r
Starting program: /home/user/obfuscate
Password: abcd

0x08048603 in ?? ()
=> 0x08048603: ff 24 8d 90 8b 04 08 jmp DWORD PTR [ecx*4+0x8048b90]
(gdb) x $ecx*4+0x8048b90
0x8048cec: 0x08048780

The register edx is used as the offset of the input string. The register ecx contains the ordinal value (-10) of the character located at the offset edx. Since the first character is ‘a’, ecx equals 87 (ord(‘a’) – 10). So this instruction will jump to the address contained in the pointer located at 87*4+0x8048b90 which is 0x08048780 :

(gdb) ni
0x08048780 in ?? ()
=> 0x08048780: 83 fa 23 cmp edx,0x23

4) Disassembling instructions related to one character

Well, let’s disassemble the instructions related to the ‘a’ character.

First, it checks if edx equals 0x23, if not, it jumps to 0x80485de. Then it performs a second check that fails if the byte at [esp + 0x2d] is 0. If both checks are valid, edx is incremented (mov edx,0x24). This means that ‘a’ would be the 24th character of the password if it were that long…

Well, since we try to verify the first character of the string, edx equals 0 so the first check fails and the instruction pointer is set to 0x80485de. Let’s take a look at these instructions:

This piece of code sets eax to 0 and ret to loc_0x8048580 which will return from the main function and prints “Incorrect!” or “Correct!”, whether eax is respectively equals to 0 or not. Since eax, is set to 0 at 0x80485de, the string “Incorrect!” will be printed and this procedure will end by returning to the main function:

Well ‘a’ is certainly not the first character of the password since rather then leaving we should have been trapped by the breakpoint before the jump into the table (in order to check the 2nd character from the string). However it allowed us to point out that:

  • A jump to 0x80485de means the password is incorrect
  • A jump to 0x80485e0 means the password is likely to be correct if eax is not 0
  • If the current character is correct, edx is incremented

5) Guessing the input password length

Up to this point, we haven’t seen any constraint related to the length of the input string since the getline() function does the job of (re-)allocating a buffer for the caller. We have neither seen a set of instructions used to remove the end of line character. Therefore, ‘\n’ must be the last non-null character from the string. Let’s disassemble the code associated:

python -c "print hex((ord('\n') - 10) * 4 + 0x8048b90)"
0x8048b90

Well, edx is compared with 0xd which means the input string length must be 14. Another interesting thing is that after performing the 2 checks (like with the ‘a’ character) it jumps this time to 0x80485e0 rather than 0x80485ab. Actually, this is only just a single instruction after the address reached when one check failed for the ‘a’ related instructions. The skipped instruction is simply xor eax, eax which involves the last check to fail at 0x8048489 (test eax,eax). This confirms what we expected previously, a jump to 0x80485e0 looks good !

3) Listing the jump-table content to find the input password

Since we know the password length, we can now focus on the password content. Let’s look at the instructions related to the character ‘b’ since we tried ‘a’ earlier:

python -c "print hex((ord('b') - 10) * 4 + 0x8048b90)"
0x8048cf0

Perfect, the first instruction tells us that ‘b’ should be at offset 11 (i.e the 12th character). Something really worth noting here is that the set of instructions related to the character ‘b’ is almost the same as the set of instructions related to the character ‘a’. Therefore, rather than checking the set of instructions related to each character manually, it might be faster to write a little script for gdb used to iterate over the jump table and to check if the first instruction of an entry compares edx to a value smaller than 0xd:


(gdb) source /path_to/parse_jump_table.py
b is at offset 11 (cmp edx,0xb)
d is at offset 3 (cmp edx,0x3)
e is at offset 7 (cmp edx,0x7)
f Weird expected a cmp edx instruction but got 0x8048820: cmp BYTE PTR [esp+0x32],0x0
g is at offset 12 (cmp edx,0xc)
h is at offset 13 (cmp edx,0xd)
i is at offset 9 (cmp edx,0x9)
j is at offset 10 (cmp edx,0xa)
k is at offset 12 (cmp edx,0xc)
o Weird expected a cmp edx instruction but got 0x8048928: cmp BYTE PTR [esp+0x3a],0x0
v is at offset 2 (cmp edx,0x2)
w is at offset 6 (cmp edx,0x6)
0 Weird expected a cmp edx instruction but got 0x8048628: test edx,edx
6 is at offset 12 (cmp edx,0xc)
7 is at offset 5 (cmp edx,0x5)
8 Weird expected a cmp edx instruction but got 0x8048730: xor eax,eax
9 is at offset 1 (cmp edx,0x1)

Well, from this script we know that the password is likely to be ?9vd?7we?ijbkh.

It’s almost good but we can see that our assumption where each set of instructions related to a character should start with cmp, edx, 0x?? is wrong. Indeed, we got 4 parsing errors. Hum… we have 3 more characters to discover, it might be interesting to inspect these errors. Let’s take a look at the instructions related to ‘f’ (0x8048820) where we got the first error:

python -c "print hex((ord('f') - 10) * 4 + 0x8048b90)"
0x8048d00

Well, here is another tricky case… The character ‘f’ is valid at offset 8 (cmp edx, 0x8 followed by and incrementation and a jump to 0x80485ab) but as well at offset 4… Indeed, if the comparison of edx against 8 failed, a jump is taken to 0x8048a9f where this time, edx is check against 4, followed by a jump to 0x8048834 where edx is incremented and another jump is taken to 0x80485ab:

Perfect, the only character remaining is at offset 0 and appears to be ‘0’. It’s not a surprise since our gdb script didn’t found the usual check against edx as the first instruction when the ordinal value of ‘0’ was used as an offset for the jump table. Rather then cmp edx, 0 the first instruction is test edx, edx. It is almost the same, in both case ZF (zero flag) will be set if edx equals 0:

python -c "print hex((ord('0') - 10) * 4 + 0x8048b90)"
0x8048c28

Well, it looks like we found the 3 missing characters. Let’s try 09vdf7wefijbkh as an input password :

pico4180@shell:/home/obfuscate$ ./obfuscate
Password: 09vdf7wefijbkh
Correct!pico4180@shell:/home/obfuscate$

Et voilà ! Hopefully you liked this write-up 🙂

If you want to play with this binary, it can be downloaded here.

remi

Security Engineer / Malware Analyst, interested in reverse engineering, vulnerability exploitation, OS architecture & software developpement.

Leave a Reply

Your email address will not be published. Required fields are marked *