Breaking CMU's Bomblab with Angr for Fun and Profit - Part 2

Welcome to Part 2 of this series on cracking CMU’s Bomblab using Angr! If you are new, I would recommend starting with part 1 here.

Phase 2

First we create a function stub for phase 2, which can be appended to our phase 1 exploit:


1
2
3
 def phase_2(argv):
    path_to_binary = argv[1]
    project = angr.Project(path_to_binary)

Let’s see how Phase 2 is called:


1
2
3
4
5
6
    0x0000000000400e4e <+174>:	call   0x40149e <read_line>
   0x0000000000400e53 <+179>:	mov    rdi,rax
   0x0000000000400e56 <+182>:	call   0x400efc <phase_2>
   0x0000000000400e5b <+187>:	call   0x4015c4 <phase_defused>
   0x0000000000400e60 <+192>:	mov    edi,0x4022ed
   0x0000000000400e65 <+197>:	call   0x400b10 <puts@plt>

So similarly to Phase 1, it gets its input from read_line, which is then passed to phase_2.

Let’s disassemble Phase 2 and see what we’ve got:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 gef➤  disas phase_2
Dump of assembler code for function phase_2:
   0x0000000000400efc <+0>:	push   rbp
   0x0000000000400efd <+1>:	push   rbx
   0x0000000000400efe <+2>:	sub    rsp,0x28
   0x0000000000400f02 <+6>:	mov    rsi,rsp
   0x0000000000400f05 <+9>:	call   0x40145c <read_six_numbers>
   0x0000000000400f0a <+14>:	cmp    DWORD PTR [rsp],0x1
   0x0000000000400f0e <+18>:	je     0x400f30 <phase_2+52>
   0x0000000000400f10 <+20>:	call   0x40143a <explode_bomb>
   0x0000000000400f15 <+25>:	jmp    0x400f30 <phase_2+52>
   0x0000000000400f17 <+27>:	mov    eax,DWORD PTR [rbx-0x4]
   0x0000000000400f1a <+30>:	add    eax,eax
   0x0000000000400f1c <+32>:	cmp    DWORD PTR [rbx],eax
   0x0000000000400f1e <+34>:	je     0x400f25 <phase_2+41>
   0x0000000000400f20 <+36>:	call   0x40143a <explode_bomb>
   0x0000000000400f25 <+41>:	add    rbx,0x4
   0x0000000000400f29 <+45>:	cmp    rbx,rbp
   0x0000000000400f2c <+48>:	jne    0x400f17 <phase_2+27>
   0x0000000000400f2e <+50>:	jmp    0x400f3c <phase_2+64>
   0x0000000000400f30 <+52>:	lea    rbx,[rsp+0x4]
   0x0000000000400f35 <+57>:	lea    rbp,[rsp+0x18]
   0x0000000000400f3a <+62>:	jmp    0x400f17 <phase_2+27>
   0x0000000000400f3c <+64>:	add    rsp,0x28
   0x0000000000400f40 <+68>:	pop    rbx
   0x0000000000400f41 <+69>:	pop    rbp
   0x0000000000400f42 <+70>:	ret    
End of assembler dump.

We see that read_six_numbers is passed with the result of read_line in rdi, and a pointer to the stack in rsi. We can infer that it most likely tries to extract 6 integers from the buffer passed by read_line into the buffer pointed by rsi, which would look something like int[6]. We know that it must be an 32 bit int and not a 64 bit long long, because otherwise the buffer would require at least 0x30 bytes, but we see that the stack is only decremented by 0x28.

A good place to start our program execution will be after returning from read_six_numbers. The reason why we want to avoid read_six_numbers is again due to state explosion - if you disassemble the function, you see lots of branches and jumps. Not good!


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
     # Tell Angr to start executing from the instruction after read_six_numbers
    start_addr = 0x00400f0a
    initial_state = project.factory.blank_state(addr=start_addr)
{% endhighlight %}

### Symbolic Values on the Stack
Since we do not know the stack address at runtime, we will need to push our symbolic arguments onto the stack. (I lied a bit - we can actually control it, but manipulating it this way is more prone to errors). So what we want is for our stack to look something like this right after we return from `read_six_numbers`:

{% highlight c %}
{% raw %}
+-------+-------+ <- rsp + 0x18
| num_6 | num_5 |
+-------+-------+ <- rsp + 0x10
| num_4 | num_3 |
+-------+-------+ <- rsp + 0x8
| num_2 | num_1 |
+-------+-------+ <- rsp

We can set this up by pushing symbolic values on the stack. Since this is a 64 bit binary and we are dealing with 32 bit ints, each time we push we will actually be pushing 2 ints:


1
2
3
4
5
6
7
     num_12 = claripy.BVS('num_12', 64)
    num_34 = claripy.BVS('num_34', 64)
    num_56 = claripy.BVS('num_56', 64)

    initial_state.stack_push(num_56)
    initial_state.stack_push(num_34)
    initial_state.stack_push(num_12)

In more complicated functions, we may actually have to set up the stack nicely (i.e ensuring rbp is at a reasonable value, potentially adding other padding onto the stack). However, in this case, we can easily see that there are no memory references using rbp as an offset, and there seems to be no other local variables being used on the stack. We will also set our termination condition to be within this stack frame, so we do not need to worry about setting things like return addresses on the stack correctly.

Let’s initialize our simulation manager, and also define our find and avoid conditions:


1
2
3
4
5
6
7
     # Create a simulation manager initialized with the starting state
    simulation = project.factory.simgr(initial_state)

    success_addr = 0x00400f42 # right before ret
    explode_addr = 0x0040143a # explode_bomb

    simulation.explore(find=success_addr, avoid=explode_addr)

We set our success address to be right before phase_2 returns, since we did not set the return address as mentioned previously. Similarly as before, we avoid the explode_bomb function as well.

Some Final Housekeeping

Finally, let’s deal with getting our solution:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
     # Check that we have found a solution
    if simulation.found:
        solution_state = simulation.found[0]

        num_12_sol = solution_state.se.eval(num_12, cast_to=int)
        num_34_sol = solution_state.se.eval(num_34, cast_to=int)
        num_56_sol = solution_state.se.eval(num_56, cast_to=int)

        def unpack_ints(n):
            lower_32_mask = (1 << 32) - 1
            return (n & lower_32_mask, (n >> 32) & lower_32_mask)

        num_1_sol, num_2_sol = unpack_ints(num_12_sol)
        num_3_sol, num_4_sol = unpack_ints(num_34_sol)
        num_5_sol, num_6_sol = unpack_ints(num_56_sol)

        print(f"{num_1_sol} {num_2_sol} {num_3_sol} {num_4_sol} {num_5_sol} {num_6_sol}")
    else:
        raise Exception('Could not find the solution')

Here, we get a 64 bit int in num_12_sol and so on. We then unpack it to retrieve the individual values. Based on the ordering on the stack, the first number would be in the lower 32 bits, and the second number would be in the higher 32 bits.

Full Solution Script

Here is the full script for phase 2:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
 import angr
import claripy
import sys

def phase_1(argv):
    # omitted
    pass

def phase_2(argv):
    path_to_binary = argv[1]
    project = angr.Project(path_to_binary)

    # Tell Angr where to start executing 
    start_addr = 0x00400f0a
    initial_state = project.factory.blank_state(addr=start_addr)

    num_12 = claripy.BVS('num_12', 64)
    num_34 = claripy.BVS('num_34', 64)
    num_56 = claripy.BVS('num_56', 64)

    initial_state.stack_push(num_56)
    initial_state.stack_push(num_34)
    initial_state.stack_push(num_12)
    
    # Create a simulation manager initialized with the starting state
    simulation = project.factory.simgr(initial_state)

    success_addr = 0x00400f42 # right before ret
    explode_addr = 0x0040143a # explode_bomb

    simulation.explore(find=success_addr, avoid=explode_addr)

    # Check that we have found a solution
    if simulation.found:
        solution_state = simulation.found[0]

        num_12_sol = solution_state.se.eval(num_12, cast_to=int)
        num_34_sol = solution_state.se.eval(num_34, cast_to=int)
        num_56_sol = solution_state.se.eval(num_56, cast_to=int)

        def unpack_ints(n):
            lower_32_mask = (1 << 32) - 1
            return (n & lower_32_mask, (n >> 32) & lower_32_mask)

        num_1_sol, num_2_sol = unpack_ints(num_12_sol)
        num_3_sol, num_4_sol = unpack_ints(num_34_sol)
        num_5_sol, num_6_sol = unpack_ints(num_56_sol)

        print(f"{num_1_sol} {num_2_sol} {num_3_sol} {num_4_sol} {num_5_sol} {num_6_sol}")
    else:
        raise Exception('Could not find the solution')

if __name__ == '__main__':
    phase_1(sys.argv)
    phase_2(sys.argv)

Let’s run it!


1
2
3
4
5
6
7
8
9
10
 $ python solve.py bomb
WARNING | 2020-07-30 23:45:46,677 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.
WARNING | 2020-07-30 23:45:46,677 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2020-07-30 23:45:46,678 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial state
WARNING | 2020-07-30 23:45:46,678 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2020-07-30 23:45:46,678 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.
WARNING | 2020-07-30 23:45:46,678 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0008 with 8 unconstrained bytes referenced from 0x400f40 (phase_2+0x44 in bomb (0x400f40))
WARNING | 2020-07-30 23:45:46,679 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0010 with 8 unconstrained bytes referenced from 0x400f41 (phase_2+0x45 in bomb (0x400f41))
CRITICAL | 2020-07-30 23:45:46,682 | angr.sim_state | The name state.se is deprecated; please use state.solver.
1 2 4 8 16 32

Now try it on the bomb itself:


1
2
3
4
5
6
7
 $ ./bomb
Welcome to my fiendish little bomb. You have 6 phases with
which to blow yourself up. Have a nice day!
Border relations with Canada have never been better.
Phase 1 defused. How about the next one?
1 2 4 8 16 32
That's number 2.  Keep going!

Awesome! We got the solution without even having to deal with the headache of figuring what Phase 2 is actually doing. While Phase 1 was relatively trivial and Angr solving it was probably not really impressive, this is something that should be making you excited now! :)

Thanks for reading, and I hope you’ve enjoyed the journey so far. You can go straight to the next part here.