Greetings! I’d like to make a quick post about the fundamentals of code analysis as an expansion of the reverse engineering topic at the end of my first blog. I believe that if you want to be a true hacker or forensic investigator, the understanding of how software works at a code level is essential. You need to be able to either break down the exploit code, or analyse the Malware code in order to truly understand it.
A great place for you guys to start is right back to the basics with our good friend “Hello World”. For this example I will be using a Linux(x64) operating system, which you guys can all get your hands on for free, and I will be coding in C. The 64 bit registers may be confusing at first because they do not use extended registers [E]IP, rather they use 64 bit general purpose registers [R]IP.”
The code:
minh-mint prog # cat hello.c int main() { int i; for(i=0; i < 10; i++) // Loop 10 times. { puts("Hello, world!\n"); // put "Hello World" to the output. } return 0; // Tell OS the program exited without errors. } |
This C program will begin with the execution of code at the function main(). Comments are shown by the use of // which are ignored by the compiler. The program simply prints out “Hello, World to the screen 10 times.
The next step is to compile the code using a compiler,
Such as gcc:
minh-mint prog # gcc hello.c
minh-mint prog # ls
a.out hello.c
minh-mint prog # ./a.out
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Generally, if you have the source code, performing reverse engineering to find out what the program does, can be done by trivially reading through the code. However, if you only have the compiled binary, what could we do to find out what the program is actually doing?
With the use of a tool called objdump
objdump is contained in the binutils package in your distro. Depending on your choice of Linux distro, it may come pre-installed or you may need to manually install it. Using objdump you can analyse the binary and how it is interacting with the CPU, in this case within a x64 architecture:
minh-mint prog # objdump -M intel -D a.out | grep -A20 main.:
00000000004004f4 :
4004f4: 55 push rbp
4004f5: 48 89 e5 mov rbp,rsp
4004f8: 48 83 ec 10 sub rsp,0x10
4004fc: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
400503: eb 0e jmp 400513
400505: bf 0c 06 40 00 mov edi,0x40060c
40050a: e8 e1 fe ff ff call 4003f0
40050f: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1
400513: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9
400517: 7e ec jle 400505
400519: b8 00 00 00 00 mov eax,0x0
40051e: c9 leave
40051f: c3 ret
0000000000400520 :
400520: f3 c3 repz ret
400522: eb 0c jmp 400530
400524: 90 nop
400525: 90 nop
400526: 90 nop
To dump a sensible amount of lines, you can grep the output to limit the amount of data onscreen. For this example, 20 lines will be plenty as you can see the end NOPs that are non operation instructions.
The hex numbers on the left, starting with 0x4004f4 are the memory addresses. Memory can be seen as a row of bytes that all have their own memory address. These bytes of memory can be accessed by going to its memory address. The CPU will access the memory address to retrieve the machine language instructions that make up the compiled program.
The second column contains the machine language instructions, that the x64 processor reads as binary values .e.g. 01001110110111, objdump will display the binary as hex, to make it a more human readable format.
The final right column contains the assembly version of the machine language instructions, as hex is till not really human readable. Assembly language is just a representation of the binary or hex values, however, the assembly language version of JMP is easier to remember than the hex value “6E” or the binary value “01101110″.
To really understand what the processor is doing, you need to analyse the register values that are used to execute the program. This can be done using a debugger, allowing you to breakdown and step through exactly what the program is doing, as it is doing it.
GDB is a great tool in Linux that will show you the state of the registers:
minh-mint prog # gdb -q ./a.out
Reading symbols from /home/minh/Desktop/prog/a.out...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x4004f8
(gdb) run
Starting program: /home/minh/Desktop/prog/a.out
Breakpoint 1, 0x00000000004004f8 in main ()
(gdb) info registers
rax 0x7ffff7dd8ec8 140737351880392
rbx 0x0 0
rcx 0x0 0
rdx 0x7fffffffebb8 140737488350136
rsi 0x7fffffffeba8 140737488350120
rdi 0x1 1
rbp 0x7fffffffeac0 0x7fffffffeac0
rsp 0x7fffffffeac0 0x7fffffffeac0
r8 0x7ffff7dd7300 140737351873280
r9 0x7ffff7deb5c0 140737351955904
r10 0x7fffffffe910 140737488349456
r11 0x7ffff7a77c90 140737348336784
r12 0x400410 4195344
r13 0x7fffffffeba0 140737488350112
r14 0x0 0
r15 0x0 0
rip 0x4004f8 0x4004f8
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) quit
A breakpoint was set at the main() function, allowing us to see the register values before the code is executed.
The first four registers rax, rbx, rcx, and rdx, are general purpose registers accumulator, base, counter and data respectively. These registers can be used for a whole host of things, however, in this case, they are used as temporary variables for the CPU, when executing machine code.
The next four registers rsi, rdi, rbp and rsp are also general purpose, better known as pointers and indexes, the registers above stand for stack index, destination index, base pointer and source pointer, which are used for storing addresses that points to location in memory.
The rip register is the instruction pointer and will keep track of what the CPU is reading. It is an important register to concentrate on while debugging, as it is essentially the control pointer. The example above shows the CPU is currently pointing at memory address 0x4004f8.
The final set of eflags registers contain several big flags and are used for comparison and memory segmentation.
Now we can start disassembling our code:
minh-mint prog # gdb -q ./a.out
Reading symbols from /home/minh/Desktop/prog/a.out...done.
(gdb) list
1
2
3 int main()
4 {
5 int i;
6 for(i=0; i < 10; i++) // Loop 10 times.
7 {
8 puts("Hello, world!\n"); // put "Hello World" to the output.
9 }
10 return 0; // Tell OS the program exited without errors.
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004004f4 : push rbp
0x00000000004004f5 : mov rbp,rsp
0x00000000004004f8 : sub rsp,0x10
0x00000000004004fc : mov DWORD PTR [rbp-0x4],0x0
0x0000000000400503 : jmp 0x400513
0x0000000000400505 : mov edi,0x40060c
0x000000000040050a : call 0x4003f0
0x000000000040050f : add DWORD PTR [rbp-0x4],0x1
0x0000000000400513 : cmp DWORD PTR [rbp-0x4],0x9
0x0000000000400517 : jle 0x400505
0x0000000000400519 : mov eax,0x0
0x000000000040051e : leave
0x000000000040051f : ret
End of assembler dump.
(gdb) break main
Breakpoint 1 at 0x4004fc: file hello.c, line 6.
(gdb) run
Starting program: /home/minh/Desktop/prog/a.out
Breakpoint 1, main () at hello.c:6
6 for(i=0; i < 10; i++) // Loop 10 times.
(gdb) info register rip
rip 0x4004fc 0x4004fc
As you can see our hello world code starts at:
0x00000000004004fc : mov DWORD PTR [rbp-0x4],0x0
Meaning that the assembly instruction will move 0 into the memory located at the rbp register minus 4 to set i=0:
(gdb) x/4xb $rbp - 4
0x7fffffffeabc: 0x00 0x00 0x00 0x00
Now we can step through and execute the current instruction (i=o) and advance rip to the next instruction:
(gdb) nexti
0x0000000000400503 6 for(i=0; i < 10; i++) // Loop 10 times.
Now that rip has advanced to the next instruction invoking the “for loop”
We will take a detailed look at that in the debugger:
(gdb) x/10i $rip
=> 0x400503 : jmp 0x400513
0x400505 : mov edi,0x40060c
0x40050a : call 0x4003f0
0x40050f : add DWORD PTR [rbp-0x4],0x1
0x400513 : cmp DWORD PTR [rbp-0x4],0x9
0x400517 : jle 0x400505
0x400519 : mov eax,0x0
0x40051e : leave
0x40051f : ret
0x400520: repz ret
The first instruction is cmp, comparing the variable i with 9:
(gdb) x/i $rip
=> 0x400513: cmp DWORD PTR [rbp-0x4],0x9
The next instruction is jle, which will jump is less than or equal to (9).
This uses the previous comparisons results stored in the eflags register, and if less than or equal to 9, rip will jump to the 0×400517 memory location:
(gdb) x/i $rip
=> 0x400517: jle 0x400505
In this instance, the previous comparison of 0 is less than 9, so rip will point to 0×400505 and result in printing “hello world” to the screen. Therefore, using a debugger we can take a closer look at the instructions at rip.
The new instructions allowing us to print hello world to the screen
The new instructions are a combination of two and start off by moving the value, which is memory address 0x40060c into edi:
(gdb) nexti
8 puts("Hello, world!\n"); // put "Hello World" to the output.
(gdb) x/2i $rip
=> 0x400505 : mov edi,0x40060c
0x40050a: call 0x4003f0
Lets take a look at what is stored at memory address 0x40060c:
(gdb) x/6cb 0x40060c
0x40060c: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 44 ','
That looks like the string we want to print, but it’s not very easy to read. We can use the string options to display the value as a string:
(gdb) x/s 0x40060c
0x40060c: "Hello, world!\n"
To complete the printing function, we need to step through to the second instruction required to call the print function:
(gdb) nexti
0x000000000040050a 8 puts("Hello, world!\n"); // put "Hello World" to the output.
(gdb) x/i $rip
=> 0x40050a: call 0x4003f0
As you can see, the hello world string has been printed to the screen:
(gdb) nexti
Hello, world!
6 for(i=0; i < 10; i++) // Loop 10 times.
Now we move onto the for loop.
We need to look at rip to see what it is doing next, and as you can see, the next two commands are completing the “for loop” by adding the value 1 to the variable i, therefore incrementing i=1, from i=0.
The second and third command should be very familiar as you have seen it earlier. The cmp command started off the “for loop” by comparing i to the value 9:
(gdb) x/2i $rip
=> 0x40050f : add DWORD PTR [rbp-0x4],0x1
0x400513 : cmp DWORD PTR [rbp-0x4],0x9
0x400517: jle 0x400505
So if i is less than or equal to 9, the above process will repeat again. Lets look at what happens when i=10, therefore ending the “for loop”:
(gdb) nexti
10 return 0; // Tell OS the program exited without errors.
(gdb) x/i $rip
=> 0x400519: mov eax,0x0
The code will execute through and exit without error by zeroing out the value in eax and exiting:
(gdb) nexti
0x00007ffff7a77d90 in __libc_start_main () from /lib/libc.so.6
(gdb) x/i $rip
=> 0x7ffff7a77d90: call 0x7ffff7a92410
So using what you’ve just learned
We can take a look at the fully disassembled C code to see what has been compiled into machine instruction:
minh-mint prog # gdb -q ./a.out
Reading symbols from /home/minh/Desktop/prog/a.out...done.
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004004f4 : push rbp
0x00000000004004f5 : mov rbp,rsp
0x00000000004004f8 : sub rsp,0x10
0x00000000004004fc : mov DWORD PTR [rbp-0x4],0x0
0x0000000000400503 : jmp 0x400513
0x0000000000400505 : mov edi,0x40060c
0x000000000040050a : call 0x4003f0
0x000000000040050f : add DWORD PTR [rbp-0x4],0x1
0x0000000000400513 : cmp DWORD PTR [rbp-0x4],0x9
0x0000000000400517 : jle 0x400505
0x0000000000400519 : mov eax,0x0
0x000000000040051e : leave
0x000000000040051f : ret
End of assembler dump.
(gdb) list
1
2
3 int main()
4 {
5 int i;
6 for(i=0; i < 10; i++) // Loop 10 times.
7 {
8 puts("Hello, world!\n"); // put "Hello World" to the output.
9 }
10 return 0; // Tell OS the program exited without errors.
11 {
I hope you guys have enjoy this short blog post, which should hopefully provide some additional information to aid in the transition from the Gray Hat Hacking Book, to the Shellcoder’s Handbook. Furthering your progress to becoming a successful reverse engineer.