SSTIC 2016 – Huge (Stage 2)

Hello 🙂

In this post I will explain how I solved ‘Huge’, from the SSTIC challenge 2016. SSTIC is a French conference related to security in computing, which takes place every year at Rennes. Around 2 months before the conference, a security challenge is published online. This year, the challenge was designed with 3 levels, where each one had its own set of 3 to 4 exercises. ‘Huge’ was a reverse engineering challenge and was part of the level 2.

1 – Extraction of the ELF 64 bit executable

A small zip archive is given as starting point and can be easily extracted with unzip:

user@debian:~$ file Zip archive data, at least v2.0 to extract
user@debian:~$ unzip
inflating: huge.tar

The zip file drops ‘huge.tar’, a sparse tar archive. This archive contains ‘Huge’, a single sparse file which is an ELF 64 bit executable. As suggested by its name, the binary is quite big since its logical size is greater than 128 TB. A sparse file is a file that attempts to use file system space more efficiently when the file itself is mostly empty. The idea is that when a file is composed of a large number of consecutive zero bytes (typically 4096, a block size), rather than wasting a physical block filled only with zero bytes, the block index within the inode structure of the file is set no null and thus no block is used. Coredump are typical examples of sparse files. tar can be used with the option –sparse to create an archive which handles sparse files efficiently:

user@debian:~$ tar tvf huge.tar
-rwxr-xr-x 0/0 128574140715008 1970-01-01 01:00 Huge

Well, I naively tried to extract ‘Huge’ from this archive but the tar command failed and the file was only partially extracted (11168083824640 bytes rather than 128574140715008 bytes):

user@debian:~$ tar vxf huge.tar
tar: Huge: Cannot seek to 25297854992384: Invalid argument
tar: Exiting with failure status due to previous errors
user@debian:~$ ls -l Huge
-rwxr-xr-x 1 user user 11168083824640 Jan 1 1970 Huge
user@debian:~$ file Huge
Huge: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, corrupted section header size

2 – Ext4 file system limitations

In order to reproduce the ‘Cannot seek to’ error, I first tried to use the dd command to create big sparse files but the error message was not really helpful (File too large). Then, I decided to write a small C program which can create a regular file whose name is specified as the 1st argument, then seek to the offset specified as the 2nd argument and finally writes the string “hello world”.

With this program, I could successfully create a sparse file of 10 TiB (2^40 * 10 bytes), but it failed when I tried to create one of 20 TiB:

user@debian:~$ gcc test.c -W -o test_seek
user@debian:~$ ./test_seek "10TiB.big" 10995116277760
user@debian:~$ ls -lh 10TiB.big
-rw------- 1 user user 10T Apr 23 00:37 10TiB.big
user@debian:~$ tail -c 12 10TiB.big && echo
hello world
user@debian:~$ ./test_seek "20TiB.big" 21990232555520
lseek: Invalid argument

This allowed me to point out that the Invalid argument error was due to the ext4 filesystem which can not handle files bigger then 16TiB. To bypass this limitation, a simple solution was to mount a ZFS partition. Indeed, ZFS is designed to be so large that limitations size should never be encountered in practice since it can handle files up to 16 EiB (2^20 bigger than 16TiB). This solution, while beeing elegant, was bothering me for few reasons:

  • How does the ELF loader handle loading an executable whose size is > 128TB?
  • How could a static analysis engine load a file that big ?
  • How long will it take to execute if the instruction pointer is set to a huge zeroed memory area ? (NOP sled like)

For these reasons, I decided to write a program used to build a small ELF executable, by reading the sparse tar archive.

3 – ‘Huge’ file format analysis

In order to rebuild efficiently as smaller ELF file, I needed to understand how ‘Huge’ was designed. Even tough it was partially extracted when the tar command failed, the ELF header as well as the program headers were still complete:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x51466a42e705
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000001000 0x00002b0000000000 0x00002b0000000000
                 0x00001ef000000000 0x00001ef000000000  R E    1000
  LOAD           0x00002afffffe1000 0x000049f000000000 0x000049f000000000
                 0x0000161000000000 0x0000161000000000  R E    1000
  LOAD           0x000049effffe1000 0x0000000000020000 0x0000000000020000
                 0x00002afffffe0000 0x00002afffffe0000  R E    1000

From the readelf utility output I learned how the ELF application is layered, and how its 3 segments should be mapped in memory:

/---------------------------------------\ 0x00
| ELF Header (64 bytes)                 |
|---------------------------------------| 0x40
| Program Header Entry 1 (56 bytes)     |
|---------------------------------------| 0x78
| Program Header Entry 2 (56 bytes)     |
|---------------------------------------| 0xb0
| Program Header Entry 3 (56 bytes)     |
|---------------------------------------| 0xe8
| PADDING  (3928 bytes)                 |
|---------------------------------------| 0x1000
| Segment 1 (~ 30,9 TiB)                |
|   Virtual address: 0x00002b0000000000 |
|   Memory Size:     0x00001ef000000000 |
|   File Size:       0x00001ef000000000 |
|---------------------------------------| 0x00001ef000001000
| PADDING (~ 12 TiB)                    |
|---------------------------------------| 0x00002afffffe1000
| Segment 2 (~ 22 TiB)                  |
|   Virtual address: 0x000049f000000000 |
|   Memory Size:     0x0000161000000000 |
|   File Size:       0x0000161000000000 |
--------------------------------------  | 0x0000410ffffe1000
| PADDING (~ 8.9 TiB)                   | 
|---------------------------------------| 0x000049effffe1000
| Segment 3 (~ 43 TiB)                  |
|   Virtual address: 0x0000000000020000 |
|   Memory Size:     0x00002afffffe0000 |
|   File Size:       0x00002afffffe0000 |
\---------------------------------------/ 0x000074effffc1000

4 – Using xsparse to parse the archive

In order to recreate a smaller ELF binary from the archive, I wrote a tool based on xsparse. xsparse is an auxiliary program that expands condensed sparse files back to their original state. It can be useful if a third-party tar, does not support sparse extensions. Since the tar archive ‘Huge.tar’ is built using the Pax format, I used the pax utility to extract the ‘Huge’ file in its GNUSparseFile format:

user@debian:~$ pax -v -f huge.tar
-rwxr-xr-x 1 root root 191 en_GB.UTF-8 PaxHeader/Huge
-rwxr-xr-x 1 root root 107008 en_GB.UTF-8 GNUSparseFile/Huge
pax: ustar vol 1, 2 files, 109568 bytes read, 0 bytes written.
user@debian:~$ pax -r -f huge.tar

For expending the a sparse file, the xsparse utility first reads the sparse map, stored at the start of the file data block itself, preceding the actual file data. The sparse map contains a series of numbers of arbitrary length, delimited by newlines. The first line contains the number of entries (25 for this sparse archive). Then, each entry is composed of 2 values: the offset within the original file and its size including all null bytes. From the beginning of the sparse map, it is known that the first entry (ELF header) is stored in the original file at offset 0 and is 4096 bytes long, the second entry (first part of the first segment) is stored at offset 1627791495168 and is 4096 bytes long, …

user@debian:~$ head -n 8 GNUSparseFile/Huge

To write my tool, I forked the C source file xsparse.c provided on the GNU website. After compilation, I tried to extract ‘GNUSparseFile/Huge’ to see how this tools handle seeking to large offset:

user@debian:~$ gcc xsparse.c -o xsparse
user@debian:~$ ./xsparse GNUSparseFile/Huge
user@debian:~$ ls -l Huge
-rwxr-xr-x 1 user user 11168083910656 May 7 12:45 Huge

Such as tar, xsparse failed to extract the file entirely. It didn’t print any error message since the return value of seek() isn’t verified.

5 – Rewriting a smaller ELF

Though ‘Huge’ couldn’t be extracted with xsparse, the sparse map as well as the sparse members were correctly analyzed. This is just what is needed to write the smaller ELF. I wrote a simple expand_sparse() function used to create the ‘NotHuge’ binary, by writing each sparse part as a program entry. As there are 25 sparse members, the program builds a smaller ELF containing the ELF header slightly modified (1st sparse member), 24 program header entries and 24 segments.

5.1 – Raw file address to virtual address translation

In order to compute the segment virtual address of a sparse member, the original segment file address is subtracted from the sparse member file address. This results in an offset to which is the added the virtual address of the corresponding segment in the original ‘Huge’ ELF:

5.2 – Linking smaller segments together

As explained previously, a really smaller virtual address space is mapped for the generated ELF. Therefore, a trick must be used to links smaller segments together which were originally part of the same segment in the original ‘Huge’ file. The idea is to replace the last null instructions from a segment with an absolute unconditional jump to the next segment. For instance to create a link between the end of the 1st segment (0x00002c7affef1000) and the start of the 2nd one (0x00002c7affef6000), the following instructions (highlighted) will replace the 12 last null bytes from the end of the 1st segment:

By overriding byte-code on the fly, without disassembly, there is no guarantee for the instruction pointer to be aligned on the movabs opcode since this depends of previous instructions alignment. The above example works since the last null instruction virtual address is already aligned on 2 bytes. If it were not the case, one byte would be needed to replace the opcode of the add BYTE PTR [rax],al instruction by a nop instruction in order to fix the alignment of the movabs rdx, addr instruction.

Here is the code used to force the absolute unconditional jump:

A drawbacks with this methods is that the ‘NotHuge’ ELF only maps virtual addresses related to non-null blocks of data. Thus, if an instruction explicitly try to access a virtual address related to a null-block of data (i.e at least 4096 consecutive null bytes aligned on a 4096 bytes boundary), a segmentation fault would be raised since this address would not be mapped.

5.3 – Execution of ‘NotHuge’

The main parts of the ‘ELFRewritter’ program are now explained. After compilation and execution, the ‘NotHuge’ binary seems to be extracted successfully. It also seems to be well handled by the ELF loader:

user@debian:~$ gcc -W ELFRewritter.c -o ELFRewritter
user@debian:~$ ./ELFRewritter GNUSparseFile/Huge
user@debian:~$ ls -l NotHuge
-rwxr-xr-x 1 user user 106496 May 8 21:43 NotHuge
user@debian:~$ ./NotHuge
Please enter the password: test
user@debian:~$ echo $?

6 – Reverse engineering

The reverse engineering process can now be started since it is possible to debug and execute the ‘NotHuge’ binary.

6.1 – Entry point

At the entry point (0x51466a42e705), a 4096 bytes buffer is allocated on the stack and is used (partially) to store the key typed by the user:

The address of the stack based buffer containg the input password is then duplicated into registers rax, rdi, rsi and sub_10f338cf000c() is called:

6.2 – sub_10f338cf000c

Few null instructions (add BYTE PTR [rax],al) are executed and the patched unconditional jumps is quickly reached. This sets the instruction pointer to 0x10f338cf7000 where several null instructions are executed until reaching 0x10f338cf7548 where first instructions operating on the input key are executed. This sub is used to transform the input key from its hexadecimal form to the binary form (unhexlify). It can handles key typed either in lower or higher hexadecimal case (e.g: “ab” or “AB” will be result in 0xAB). From the ASCII table it is known that:

  • ‘0’ is at offset 0x30
  • ‘A’ at offset 0x41
  • ‘a’ at offset 0x61

Thus, to perform the unhexlify operation, it subtracts 0x30 to the character and if the result is between 0 and 9, it means the character is within range [0-9]. If not, it continues and subtract to the character 0x11 (difference between ‘A’ and ‘0’) and if the result is between 0 and 5 it means the character is within range [A-F]. If not, it continues and subtract to the character 0x20 (difference between ‘a’ and ‘A’) and if the result is between 0 and 5 it means the character is within range [a-f]. When the character is within range [A-Fa-f], 10 is added to the result to complete the conversion. This operation is executed twice per byte of the key in order to get the 4 higher and 4 lower bits:

6.3 – sub_43abdb4a000c

After returning from sub_10f338cf000c(), an unconditional jump is performed to 0x43abdb4a000c:

Few null bytes instructions are executed until reaching 0x43abdb4a0aee where the 1st byte of the key is compared against 0x29. If they are equal, an absolute jump is performed to 0x4a170682000c:

6.4 – sub_4a170682000c

Few null bytes instructions are executed and the patched unconditional jumps is reached leading the instruction pointer to 0x4a170682e000 where null bytes instructions are also executed until 0x4a170682ede0. The 3rd and 4th byte of the key are then compared against the value 0xd17e, as a word. Therefore the 3rd byte must be equal to 0x7e and the 4th byte to 0xd1. If the comparison succeeds, the instruction pointer is set to 0x6f4b0e0000c

Up to this point, 3 bytes of the key are known: 29??7ED1????????????????????????.

6.5 – sub_6f4b0e0000c

Few null bytes instructions are executed and the patched unconditional jump is reached leading the instruction pointer to 0x6f4b0e0f000 where null bytes instructions are also executed until 0x6f4b0e0f370. The 12th byte of the key is compared against the value 0x8c. If the check is true, a jump is performed to 0x49e7e541000c.

Up to this point, 4 bytes of the key are known: 29??7ED1??????????????8C????????.

6.6 – sub_49e7e541000c

Few null bytes instructions are executed and the patched unconditional jump is reached leading the instruction pointer to 0x49e7e541b000 where null bytes instructions are also executed until 0x49e7e541be19. Then, the value of rax is saved onto the stack and the address rsp+0x10b is loaded into rax. A null byte is moved at this address and the 2nd byte of the key is moved into rbx. Then, the address 0x49e7e541be38 is loaded into rcx. This address corresponds to the first null-byte instruction following the jmp rcx. Finally a jump is performed to rcx + rbx * 2. Therefore, the 2nd byte of the key, stored into rbx, defines the number of null-byte instructions skipped and has consequently an impact on the byte referenced by the address stored in rax:

At the address 0x49e7e541c038 (i.e 512 bytes after the jmp rcx instruction), the byte stored at the address referenced by rax is compared against 0x65. This gap of 512 bytes corresponds to 256 add BYTE PTR [rax],al instructions. By setting a breakpoint before the jmp rcx instruction, the value of al can be displayed and appears to be 3:

(gdb) b *0x49e7e541be36
(gdb) c
Breakpoint 1, 0x000049e7e541be36 in ?? ()
(gdb) p /x $al
$1 = 0x3

Each time that a add BYTE PTR [rax],al instruction is executed, the byte stored at the address referenced by rax is incremented by 3. Therefore, a simple python script can be used to compute the number of iterations needed to get the value 0x65:

The execution of the script gives the following output: After 119 iterations byte = 0x65. Since 119 iterations are needed, 137 iterations can be skipped (256-119). The 2nd byte of the key is therefore hex(137) which is 0x89. If set correctly, a jump is performed to the absolute address 0x352845ab000c.
Up to this point, 5 bytes of the key are known: 29897ED1??????????????8C????????.

6.7 – sub_352845ab000c

Few null bytes instructions are executed and the patched unconditional jump is reached leading the instruction pointer to 0x352845ab3000 where null bytes instructions are also executed until 0x352845ab3bce. The address of the instruction xor ebx,DWORD PTR [rsp+0xc] is loaded into rbx. Then, ebx which contains the 4 lower bytes of this address (0x45ab3bd5) is xored with key[12:16] and the result is expected to be 0xa9b00f5c. Thanks to the associativity property of the xor operation, the double word can be deduced doing the reverse operation: 0x45ab3bd5 ^ 0xa9b00f5c = 0xec1b3489. If the last 4 bytes of the key are good, a jump is performed to 0x59cb440c000c.

Up to this point, 9 bytes of the key are known: 29897ED1??????????????8C89341BEC.

6.8 – sub_59cb440c000cc

Few null bytes instructions are executed and the patched unconditional jump is reached leading the instruction pointer to 0x59cb440c1000 where null bytes instructions are also executed until 0x59cb440c4524. Then, 8 bytes following the jmp rcx instruction are loaded into rcx. This sets rcx to 0x000059cb440c4556. The double word key[8:12] is loaded into eax and xored against rbx which holds the value 0x59cbc8cc0b83. If the result of this operation equals rcx, a jump is performed to 0x2a7ee24a000c. One more time, the associativity property of the xor operation can be applied to deduce the key[8:12]: 0x59cbc8cc0b83 ^ 0x59cb440c4556 = 0x8cc04ed5.

Up to this point, 12 bytes of the key are known: 29897ED1????????D54EC08C89341BEC.

6.9 – sub_2a7ee24a000c

Few null bytes instructions are executed and the patched unconditional jump is reached leading the instruction pointer to 0x2a7ee24aa1000 where null bytes instructions are also executed until 0x2a7ee24aae24. Then, instructions related to floating-point numbers are executed.

6.9.1 Introduction to the x87 FPU

Since I had never played with assembly instructions related to the x87 Floating-Point Unit (FPU) before this challenge, I started reading the Intel documentation programming with the x87 FPU. The x87 instruction set includes instructions for basic floating point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the cosine function, for example. The x87 FPU data registers consist of eight 80-bit registers. Values are stored in these registers in the double extended-precision floating-point format.

6.9.2 x87 instructions analysis

The address 0x2a7ee24aae74, first instruction following jmp rbx, is loaded into rsi and 10 bytes referenced by this address are copied into a stack buffer. Then, four bytes of the key (key[4:8]) are used to xor four bytes of this buffer.
After this operation, a double extended-precision floating-point number is created from the 10 bytes buffer with the instruction fld TBYTE PTR [rsp+0x18], which pushes the floating-point operand to the top of the x87 FPU data-register stack. This number is then duplicated thanks to the instruction fld st(0).

The cosine trigonometric function is called with the fcos instruction which pops it sources operands from the stack, and pushes its result back on the stack. The results of the cosines function is then compared with its own sources operands. Indeed, the fcompp instruction compares ST(0) with ST(1) which are respectively the results of the cosines function and its source operand, duplicated previously. The fcompp instruction sets the condition code flags (C0 through C3 from the FPU status register) according to the result of the comparison. Then, the fstsw ax instruction stores the FPU status word into ax, and this register is masked to ignore the precision mask bit within the FPU status word. Finally, the cmp ax,0x4000 instruction checks if only the condition code flags C3 is set, which would means that the equation x = cos(x) is true for the given floating-point number:

6.9.3 Solving x = cos(x)

By setting a breakpoint before the xor operation, the 10 bytes buffer can be printed:

(gdb) x /10b 0x2a7ee24aae74
0x2a7ee24aae74: 0xcb 0x6d 0x71 0x1e 0x38 0x78 0xb8 0x24
0x2a7ee24aae7c: 0xfe 0x3f

Once loaded into a x87 FPU register (after beeing xored) the floating-point number has hexadecimal value 0x3ffe????????1e716dcb and is encoded within the extended (80-bit) double floating point format. One bit is used as the sign of the significand, 15 bits for the exponent field and 64 bits for the significand. The exponent field is biased by 16383, meaning that 16383 has to be subtracted from the value in the exponent field to compute the actual power of 2:
The decimal value can be retrieved from a double floating point using operation: $latex (-1)^{sign} * (integer part+fraction) * 2^{exponent-16383} $.

Since the 16 first bits are known, the bit sign and the exponent can be found. The bit sign 0 (0x3ffe >> 15) means the number is positive. The exponent 16382 (0x3ffe & 0x7fff) means that 62 on 63 bit of the fractional part are used to store the decimal part. By looking for an approximative solution to x = cos(x), I found a link to the on-line encyclopedia of integer sequences website, which explains that the solution is known as the Dottie number (~ 0.739). Since x = cos(x) converges to the dottie number, no matter what number you starts with, you will get the same answer after a high number of iterations. Therefore, in order to find the solution encoded as a double floating point number, I wrote the following C program used to repeat x = cos(x) until x == cos(x):

By executing this program with x initialized to 0.739, only 92 iterations are needed to compute the right value:

user@debian:~$ gcc cos.c -W -std=c99 -lm -o ComputeCosX
user@debian:~$ ./ComputeCosX
After 92 iterations found x = cos(x) for value 0.739085
Hexadecimal value is 0x3ffebd34aeec1e716dcb

The exact representation of this floating point number can be computed as a fraction with the following python script:

The execution of this script gives the following output:
‘Result as a fraction: 13633714301103599051/18446744073709551616’.

Thanks to bc which supports arbitrary precision numbers, I could get a precision up to 10^-64 decimals from the resulted fraction:

user@debian:~$ bc -q

Well, the parenthesis related to floating point precision is over. Since the value 0x3ffebd34aeec1e716dcb gives the 4 bytes expected after xoring key[4:8] with 0x24b87838, key[4:8] can be deduced doing the reverse operation: 0x24b87838 ^ 0xbd34aeec = 0x998cd6d4.

Now, all the bytes of the input key are known: 29897ED1D4D68C99D54EC08C89341BEC.

user@debian:~$ ./NotHuge
Please enter the password: 29897ED1D4D68C99D54EC08C89341BEC
The key is: E574B514667F6AB2D83047BB871A54F5

Good ! One last stub is executed to derivate the key from the input password and the key is printed on stdout !

7 – Conclusion

I really enjoyed doing this challenge since it allowed me to discover limits related to the ext4 filesystem, to learn how sparse archives are designed, to dig a little bit into the ELF format specification, and finally, to discover instructions related to the FPU and the encoding of floating point numbers. Nervertheless, this challenge didn’t involved anti-debugging tricks nor cryptography and was therefore accessible to reverse-engineering beginners, like me. Thanks to the designer of this challenge !



Security Engineer / Malware Analyst, interested in reverse engineering, vulnerability exploitation, OS architecture & software developpement.

Leave a Reply

Your email address will not be published. Required fields are marked *