$$ \newcommand{\bone}{\mathbf{1}} \newcommand{\bbeta}{\mathbf{\beta}} \newcommand{\bdelta}{\mathbf{\delta}} \newcommand{\bepsilon}{\mathbf{\epsilon}} \newcommand{\blambda}{\mathbf{\lambda}} \newcommand{\bomega}{\mathbf{\omega}} \newcommand{\bpi}{\mathbf{\pi}} \newcommand{\bphi}{\mathbf{\phi}} \newcommand{\bvphi}{\mathbf{\varphi}} \newcommand{\bpsi}{\mathbf{\psi}} \newcommand{\bsigma}{\mathbf{\sigma}} \newcommand{\btheta}{\mathbf{\theta}} \newcommand{\btau}{\mathbf{\tau}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\boldf}{\mathbf{f}} \newcommand{\bg}{\mathbf{g}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bi}{\mathbf{i}} \newcommand{\bj}{\mathbf{j}} \newcommand{\bk}{\mathbf{k}} \newcommand{\bell}{\mathbf{\ell}} \newcommand{\bm}{\mathbf{m}} \newcommand{\bn}{\mathbf{n}} \newcommand{\bo}{\mathbf{o}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bs}{\mathbf{s}} \newcommand{\bt}{\mathbf{t}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bA}{\mathbf{A}} \newcommand{\bB}{\mathbf{B}} \newcommand{\bC}{\mathbf{C}} \newcommand{\bD}{\mathbf{D}} \newcommand{\bE}{\mathbf{E}} \newcommand{\bF}{\mathbf{F}} \newcommand{\bG}{\mathbf{G}} \newcommand{\bH}{\mathbf{H}} \newcommand{\bI}{\mathbf{I}} \newcommand{\bJ}{\mathbf{J}} \newcommand{\bK}{\mathbf{K}} \newcommand{\bL}{\mathbf{L}} \newcommand{\bM}{\mathbf{M}} \newcommand{\bN}{\mathbf{N}} \newcommand{\bP}{\mathbf{P}} \newcommand{\bQ}{\mathbf{Q}} \newcommand{\bR}{\mathbf{R}} \newcommand{\bS}{\mathbf{S}} \newcommand{\bT}{\mathbf{T}} \newcommand{\bU}{\mathbf{U}} \newcommand{\bV}{\mathbf{V}} \newcommand{\bW}{\mathbf{W}} \newcommand{\bX}{\mathbf{X}} \newcommand{\bY}{\mathbf{Y}} \newcommand{\bZ}{\mathbf{Z}} \newcommand{\bsa}{\boldsymbol{a}} \newcommand{\bsb}{\boldsymbol{b}} \newcommand{\bsc}{\boldsymbol{c}} \newcommand{\bsd}{\boldsymbol{d}} \newcommand{\bse}{\boldsymbol{e}} \newcommand{\bsoldf}{\boldsymbol{f}} \newcommand{\bsg}{\boldsymbol{g}} \newcommand{\bsh}{\boldsymbol{h}} \newcommand{\bsi}{\boldsymbol{i}} \newcommand{\bsj}{\boldsymbol{j}} \newcommand{\bsk}{\boldsymbol{k}} \newcommand{\bsell}{\boldsymbol{\ell}} \newcommand{\bsm}{\boldsymbol{m}} \newcommand{\bsn}{\boldsymbol{n}} \newcommand{\bso}{\boldsymbol{o}} \newcommand{\bsp}{\boldsymbol{p}} \newcommand{\bsq}{\boldsymbol{q}} \newcommand{\bsr}{\boldsymbol{r}} \newcommand{\bss}{\boldsymbol{s}} \newcommand{\bst}{\boldsymbol{t}} \newcommand{\bsu}{\boldsymbol{u}} \newcommand{\bsv}{\boldsymbol{v}} \newcommand{\bsw}{\boldsymbol{w}} \newcommand{\bsx}{\boldsymbol{x}} \newcommand{\bsy}{\boldsymbol{y}} \newcommand{\bsz}{\boldsymbol{z}} \newcommand{\bsA}{\boldsymbol{A}} \newcommand{\bsB}{\boldsymbol{B}} \newcommand{\bsC}{\boldsymbol{C}} \newcommand{\bsD}{\boldsymbol{D}} \newcommand{\bsE}{\boldsymbol{E}} \newcommand{\bsF}{\boldsymbol{F}} \newcommand{\bsG}{\boldsymbol{G}} \newcommand{\bsH}{\boldsymbol{H}} \newcommand{\bsI}{\boldsymbol{I}} \newcommand{\bsJ}{\boldsymbol{J}} \newcommand{\bsK}{\boldsymbol{K}} \newcommand{\bsL}{\boldsymbol{L}} \newcommand{\bsM}{\boldsymbol{M}} \newcommand{\bsN}{\boldsymbol{N}} \newcommand{\bsP}{\boldsymbol{P}} \newcommand{\bsQ}{\boldsymbol{Q}} \newcommand{\bsR}{\boldsymbol{R}} \newcommand{\bsS}{\boldsymbol{S}} \newcommand{\bsT}{\boldsymbol{T}} \newcommand{\bsU}{\boldsymbol{U}} \newcommand{\bsV}{\boldsymbol{V}} \newcommand{\bsW}{\boldsymbol{W}} \newcommand{\bsX}{\boldsymbol{X}} \newcommand{\bsY}{\boldsymbol{Y}} \newcommand{\bsZ}{\boldsymbol{Z}} \newcommand{\calA}{\mathcal{A}} \newcommand{\calB}{\mathcal{B}} \newcommand{\calC}{\mathcal{C}} \newcommand{\calD}{\mathcal{D}} \newcommand{\calE}{\mathcal{E}} \newcommand{\calF}{\mathcal{F}} \newcommand{\calG}{\mathcal{G}} \newcommand{\calH}{\mathcal{H}} \newcommand{\calI}{\mathcal{I}} \newcommand{\calJ}{\mathcal{J}} \newcommand{\calK}{\mathcal{K}} \newcommand{\calL}{\mathcal{L}} \newcommand{\calM}{\mathcal{M}} \newcommand{\calN}{\mathcal{N}} \newcommand{\calO}{\mathcal{O}} \newcommand{\calP}{\mathcal{P}} \newcommand{\calQ}{\mathcal{Q}} \newcommand{\calR}{\mathcal{R}} \newcommand{\calS}{\mathcal{S}} \newcommand{\calT}{\mathcal{T}} \newcommand{\calU}{\mathcal{U}} \newcommand{\calV}{\mathcal{V}} \newcommand{\calW}{\mathcal{W}} \newcommand{\calX}{\mathcal{X}} \newcommand{\calY}{\mathcal{Y}} \newcommand{\calZ}{\mathcal{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\F}{\mathbb{F}} \newcommand{\Q}{\mathbb{Q}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \newcommand{\nnz}[1]{\mbox{nnz}(#1)} \newcommand{\dotprod}[2]{\langle #1, #2 \rangle} \newcommand{\ignore}[1]{} \let\Pr\relax \DeclareMathOperator*{\Pr}{\mathbf{Pr}} \newcommand{\E}{\mathbb{E}} \DeclareMathOperator*{\Ex}{\mathbf{E}} \DeclareMathOperator*{\Var}{\mathbf{Var}} \DeclareMathOperator*{\Cov}{\mathbf{Cov}} \DeclareMathOperator*{\stddev}{\mathbf{stddev}} \DeclareMathOperator*{\avg}{avg} \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\size}{size} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\dist}{dist} \DeclareMathOperator{\vol}{vol} \DeclareMathOperator{\spn}{span} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\Tr}{Tr} \DeclareMathOperator{\codim}{codim} \DeclareMathOperator{\diag}{diag} \newcommand{\PTIME}{\mathsf{P}} \newcommand{\LOGSPACE}{\mathsf{L}} \newcommand{\ZPP}{\mathsf{ZPP}} \newcommand{\RP}{\mathsf{RP}} \newcommand{\BPP}{\mathsf{BPP}} \newcommand{\P}{\mathsf{P}} \newcommand{\NP}{\mathsf{NP}} \newcommand{\TC}{\mathsf{TC}} \newcommand{\AC}{\mathsf{AC}} \newcommand{\SC}{\mathsf{SC}} \newcommand{\SZK}{\mathsf{SZK}} \newcommand{\AM}{\mathsf{AM}} \newcommand{\IP}{\mathsf{IP}} \newcommand{\PSPACE}{\mathsf{PSPACE}} \newcommand{\EXP}{\mathsf{EXP}} \newcommand{\MIP}{\mathsf{MIP}} \newcommand{\NEXP}{\mathsf{NEXP}} \newcommand{\BQP}{\mathsf{BQP}} \newcommand{\distP}{\mathsf{dist\textbf{P}}} \newcommand{\distNP}{\mathsf{dist\textbf{NP}}} \newcommand{\eps}{\epsilon} \newcommand{\lam}{\lambda} \newcommand{\dleta}{\delta} \newcommand{\simga}{\sigma} \newcommand{\vphi}{\varphi} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\wt}[1]{\widetilde{#1}} \newcommand{\wh}[1]{\widehat{#1}} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\ul}[1]{\underline{#1}} \newcommand{\ot}{\otimes} \newcommand{\zo}{\{0,1\}} \newcommand{\co}{:} %\newcommand{\co}{\colon} \newcommand{\bdry}{\partial} \newcommand{\grad}{\nabla} \newcommand{\transp}{^\intercal} \newcommand{\inv}{^{-1}} \newcommand{\symmdiff}{\triangle} \newcommand{\symdiff}{\symmdiff} \newcommand{\half}{\tfrac{1}{2}} \newcommand{\mathbbm}{\Bbb} \newcommand{\bbone}{\mathbbm 1} \newcommand{\Id}{\bbone} \newcommand{\SAT}{\mathsf{SAT}} \newcommand{\bcalG}{\boldsymbol{\calG}} \newcommand{\calbG}{\bcalG} \newcommand{\bcalX}{\boldsymbol{\calX}} \newcommand{\calbX}{\bcalX} \newcommand{\bcalY}{\boldsymbol{\calY}} \newcommand{\calbY}{\bcalY} \newcommand{\bcalZ}{\boldsymbol{\calZ}} \newcommand{\calbZ}{\bcalZ} $$

Breaking CMU's Bomblab with Angr for Fun and Profit - Part 1

post.cover
Arapaho Glacier, Roosevelt National Forest, Colorado, USA

I have recently been learning about Angr, a binary analysis framework developed by UC Santa Barbara and Arizona State University. It caught my eye because of its versatility and utility in reverse engineering binaries whose disassembly and decompilation are hard to understand manually. Oftentimes, it is simply due to the fact that it was compiled from newer or relatively less popular languages like Rust or Haskell, where the state of currently publicly available decompilers leaves much to be desired. Angr’s ability to perform symbolic execution therefore allows us to blackbox certain functionality within the program (or even the entire program) by attempting to find the right input for a desired output.

But what is symbolic execution anyway?

Do you recall when you had to first begin manipulating symbols in math class during elementary school? Yes, algebra! Symbolic execution can be thought of as manipulating symbols in order to derive certain constraints. These constraints can then be solved by a Satisfiability Modulo Theories (SMT) solver like Z3.

Here is a simple example:

1
2
3
4
5
6
int x;
scanf("%d", &x);
int y = x + 5;
if (y == 20) {
  print_flag();
}

Suppose we want Angr to figure out how how to reach line 5 (print_flag) of the program during its execution. From a very high level, you can tell Angr that the instruction corresponding to what happens at line 4 is an address that you want it to find, and Angr will be able to work backwards and deduce symbolically that we need to constrain y = 20, which implies that we have to then constrain x=15, and so the user input from stdin must correspond to 15.

Of course, this is a very contrived example, and in practice the constraints are usually in ranges (i.e x > 0), and you can end up with a lot of potential inputs for a desired output (maybe even infinitely many). When situations like this happens, you can ask Angr to return an arbitrary valid input, or return n such inputs, and many other options which you can refer to here.

Bomblab

Now that we have a basic understanding of what Angr is and what symbolic execution is about, let’s put our newfound skills to the test! I decided to try it out with Carnegie Mellon’s Bomb Lab (you can download it here). It is the second lab for the class 15-213 Introduction to Computer Systems in CMU which I took last year, and which is a required class for all computer science majors. Back then when I took the class, I printed out the disassembly from objdump onto paper and traced all of the function calls and loops manually. I also did some basic dynamic analysis with gdb to debug my inputs and confirm that my intuition for what was happening was correct. It was a slow but fun and rewarding process.

Let’s see how easily we can solve Bomblab with Angr together! I have structured this walkthrough into 7 separate posts, one post for each phase of the bomb (including the secret phase). I will also approach it without relying on any prior knowledge of the phases. Without further ado, let’s get started!

Phase 1

It’s always good to first check the attributes of the binary. The main thing I am concerned about is PIE (position independent code), because that would potentially make referencing addresses more difficult (Angr does have support for PIE though):

1
2
3
4
5
6
7
$ checksec bomb
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)
    FORTIFY:  Enabled

Awesome, we see that there is no PIE, and that this is also a 64 bit binary.

Let’s create a new Angr project skeleton:

1
2
3
4
5
6
7
8
9
10
11
import angr
import claripy
import sys

def phase_1(argv):
    # Create an Angr project.
    path_to_binary = argv[1] # :string
    project = angr.Project(path_to_binary)

if __name__ == '__main__':
    phase_1(sys.argv)

Now let’s look at the disassembly in gdb (truncated to the interesting parts for brevity):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
   0x0000000000400e19 <+121>:	call   0x4013a2 <initialize_bomb>
   0x0000000000400e1e <+126>:	mov    edi,0x402338
   0x0000000000400e23 <+131>:	call   0x400b10 <puts@plt>
   0x0000000000400e28 <+136>:	mov    edi,0x402378
   0x0000000000400e2d <+141>:	call   0x400b10 <puts@plt>
   0x0000000000400e32 <+146>:	call   0x40149e <read_line>
   0x0000000000400e37 <+151>:	mov    rdi,rax
   0x0000000000400e3a <+154>:	call   0x400ee0 <phase_1>
   0x0000000000400e3f <+159>:	call   0x4015c4 <phase_defused>
   0x0000000000400e44 <+164>:	mov    edi,0x4023a8
   0x0000000000400e49 <+169>:	call   0x400b10 <puts@plt>
   0x0000000000400e4e <+174>:	call   0x40149e <read_line>
   0x0000000000400e53 <+179>:	mov    rdi,rax
   0x0000000000400e56 <+182>:	call   0x400efc <phase_2>
   0x0000000000400e5b <+187>:	call   0x4015c4 <phase_defused>
   0x0000000000400e60 <+192>:	mov    edi,0x4022ed
   0x0000000000400e65 <+197>:	call   0x400b10 <puts@plt>
   0x0000000000400e6a <+202>:	call   0x40149e <read_line>
   0x0000000000400e6f <+207>:	mov    rdi,rax
   0x0000000000400e72 <+210>:	call   0x400f43 <phase_3>
   0x0000000000400e77 <+215>:	call   0x4015c4 <phase_defused>
   0x0000000000400e7c <+220>:	mov    edi,0x40230b
   0x0000000000400e81 <+225>:	call   0x400b10 <puts@plt>
   0x0000000000400e86 <+230>:	call   0x40149e <read_line>
   0x0000000000400e8b <+235>:	mov    rdi,rax
   0x0000000000400e8e <+238>:	call   0x40100c <phase_4>
   0x0000000000400e93 <+243>:	call   0x4015c4 <phase_defused>
   0x0000000000400e98 <+248>:	mov    edi,0x4023d8
   0x0000000000400e9d <+253>:	call   0x400b10 <puts@plt>
   0x0000000000400ea2 <+258>:	call   0x40149e <read_line>
   0x0000000000400ea7 <+263>:	mov    rdi,rax
   0x0000000000400eaa <+266>:	call   0x401062 <phase_5>
   0x0000000000400eaf <+271>:	call   0x4015c4 <phase_defused>
   0x0000000000400eb4 <+276>:	mov    edi,0x40231a
   0x0000000000400eb9 <+281>:	call   0x400b10 <puts@plt>
   0x0000000000400ebe <+286>:	call   0x40149e <read_line>
   0x0000000000400ec3 <+291>:	mov    rdi,rax
   0x0000000000400ec6 <+294>:	call   0x4010f4 <phase_6>
   0x0000000000400ecb <+299>:	call   0x4015c4 <phase_defused>

Here we see that there are 6 phases in the main function, which are logically isolated from one another. We also see that it calls a read_line function, which is not a standard glibc function. Let’s look at phase_1:

1
2
3
4
5
6
7
8
9
10
gef  disas phase_1
Dump of assembler code for function phase_1:
   0x0000000000400ee0 <+0>:	sub    rsp,0x8
   0x0000000000400ee4 <+4>:	mov    esi,0x402400
   0x0000000000400ee9 <+9>:	call   0x401338 <strings_not_equal>
   0x0000000000400eee <+14>:	test   eax,eax
   0x0000000000400ef0 <+16>:	je     0x400ef7 <phase_1+23>
   0x0000000000400ef2 <+18>:	call   0x40143a <explode_bomb>
   0x0000000000400ef7 <+23>:	add    rsp,0x8
   0x0000000000400efb <+27>:	ret 

What it does is really simple - compare the user input to the string at 0x402400, and we easily solve it without Angr. But let’s try doing it with Angr anyway, because it will yield several valuable learning points.

The first thing to take note of is that user input is being read with a custom read_line function. Let’s take a look at that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
   0x000000000040149e <+0>:	sub    rsp,0x8
   0x00000000004014a2 <+4>:	mov    eax,0x0
   0x00000000004014a7 <+9>:	call   0x4013f9 <skip>
   0x00000000004014ac <+14>:	test   rax,rax
   0x00000000004014af <+17>:	jne    0x40151f <read_line+129>
   0x00000000004014b1 <+19>:	mov    rax,QWORD PTR [rip+0x202290]        # 0x603748 <stdin@@GLIBC_2.2.5>
   0x00000000004014b8 <+26>:	cmp    QWORD PTR [rip+0x2022a9],rax        # 0x603768 <infile>
   0x00000000004014bf <+33>:	jne    0x4014d5 <read_line+55>
   0x00000000004014c1 <+35>:	mov    edi,0x4025d5
   0x00000000004014c6 <+40>:	call   0x400b10 <puts@plt>
   0x00000000004014cb <+45>:	mov    edi,0x8
   0x00000000004014d0 <+50>:	call   0x400c20 <exit@plt>
   0x00000000004014d5 <+55>:	mov    edi,0x4025f3
   0x00000000004014da <+60>:	call   0x400ae0 <getenv@plt>
   0x00000000004014df <+65>:	test   rax,rax
   0x00000000004014e2 <+68>:	je     0x4014ee <read_line+80>
   0x00000000004014e4 <+70>:	mov    edi,0x0
   0x00000000004014e9 <+75>:	call   0x400c20 <exit@plt>
   0x00000000004014ee <+80>:	mov    rax,QWORD PTR [rip+0x202253]        # 0x603748 <stdin@@GLIBC_2.2.5>
   0x00000000004014f5 <+87>:	mov    QWORD PTR [rip+0x20226c],rax        # 0x603768 <infile>
   0x00000000004014fc <+94>:	mov    eax,0x0
   0x0000000000401501 <+99>:	call   0x4013f9 <skip>
   0x0000000000401506 <+104>:	test   rax,rax
   0x0000000000401509 <+107>:	jne    0x40151f <read_line+129>
   0x000000000040150b <+109>:	mov    edi,0x4025d5
   0x0000000000401510 <+114>:	call   0x400b10 <puts@plt>
   0x0000000000401515 <+119>:	mov    edi,0x0
   0x000000000040151a <+124>:	call   0x400c20 <exit@plt>
   0x000000000040151f <+129>:	mov    edx,DWORD PTR [rip+0x20223b]        # 0x603760 <num_input_strings>
   0x0000000000401525 <+135>:	movsxd rax,edx
   0x0000000000401528 <+138>:	lea    rsi,[rax+rax*4]
   0x000000000040152c <+142>:	shl    rsi,0x4
   0x0000000000401530 <+146>:	add    rsi,0x603780
   0x0000000000401537 <+153>:	mov    rdi,rsi
   0x000000000040153a <+156>:	mov    eax,0x0
   0x000000000040153f <+161>:	mov    rcx,0xffffffffffffffff
   0x0000000000401546 <+168>:	repnz scas al,BYTE PTR es:[rdi]
   0x0000000000401548 <+170>:	not    rcx
   0x000000000040154b <+173>:	sub    rcx,0x1
   0x000000000040154f <+177>:	cmp    ecx,0x4e
   0x0000000000401552 <+180>:	jle    0x40159a <read_line+252>
   0x0000000000401554 <+182>:	mov    edi,0x4025fe
   0x0000000000401559 <+187>:	call   0x400b10 <puts@plt>
   0x000000000040155e <+192>:	mov    eax,DWORD PTR [rip+0x2021fc]        # 0x603760 <num_input_strings>
   0x0000000000401564 <+198>:	lea    edx,[rax+0x1]
   0x0000000000401567 <+201>:	mov    DWORD PTR [rip+0x2021f3],edx        # 0x603760 <num_input_strings>
   0x000000000040156d <+207>:	cdqe   
   0x000000000040156f <+209>:	imul   rax,rax,0x50
   0x0000000000401573 <+213>:	movabs rdi,0x636e7572742a2a2a
   0x000000000040157d <+223>:	mov    QWORD PTR [rax+0x603780],rdi
   0x0000000000401584 <+230>:	movabs rdi,0x2a2a2a64657461
   0x000000000040158e <+240>:	mov    QWORD PTR [rax+0x603788],rdi
   0x0000000000401595 <+247>:	call   0x40143a <explode_bomb>
   0x000000000040159a <+252>:	sub    ecx,0x1
   0x000000000040159d <+255>:	movsxd rcx,ecx
   0x00000000004015a0 <+258>:	movsxd rax,edx
   0x00000000004015a3 <+261>:	lea    rax,[rax+rax*4]
   0x00000000004015a7 <+265>:	shl    rax,0x4
   0x00000000004015ab <+269>:	mov    BYTE PTR [rcx+rax*1+0x603780],0x0
   0x00000000004015b3 <+277>:	add    edx,0x1
   0x00000000004015b6 <+280>:	mov    DWORD PTR [rip+0x2021a4],edx        # 0x603760 <num_input_strings>
   0x00000000004015bc <+286>:	mov    rax,rsi
   0x00000000004015bf <+289>:	add    rsp,0x8
   0x00000000004015c3 <+293>:	ret    

We don’t see any functions which reads in input directly being called, like scanf, read, or fgets. Digging further though, we see that the input is actually being read in the skip function, which calls fgets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
gef  disas 0x4013f9
Dump of assembler code for function skip:
   0x00000000004013f9 <+0>:	push   rbx
   0x00000000004013fa <+1>:	movsxd rax,DWORD PTR [rip+0x20235f]        # 0x603760 <num_input_strings>
   0x0000000000401401 <+8>:	lea    rdi,[rax+rax*4]
   0x0000000000401405 <+12>:	shl    rdi,0x4
   0x0000000000401409 <+16>:	add    rdi,0x603780
   0x0000000000401410 <+23>:	mov    rdx,QWORD PTR [rip+0x202351]        # 0x603768 <infile>
   0x0000000000401417 <+30>:	mov    esi,0x50
   0x000000000040141c <+35>:	call   0x400b80 <fgets@plt>
   0x0000000000401421 <+40>:	mov    rbx,rax
   0x0000000000401424 <+43>:	test   rax,rax
   0x0000000000401427 <+46>:	je     0x401435 <skip+60>
   0x0000000000401429 <+48>:	mov    rdi,rax
   0x000000000040142c <+51>:	call   0x4013bc <blank_line>
   0x0000000000401431 <+56>:	test   eax,eax
   0x0000000000401433 <+58>:	jne    0x4013fa <skip+1>
   0x0000000000401435 <+60>:	mov    rax,rbx
   0x0000000000401438 <+63>:	pop    rbx
   0x0000000000401439 <+64>:	ret    
End of assembler dump.

The point here is that we don’t want to reverse precisely what read_line does, and just rely on our intuition that it basically reads in a line. We also see that there are a couple of checks in the loops, which can potentially lead to state explosion.

State Explosion

State explosion is an extremely important concept in symbolic execution, and it is the primary reason why achieving general symbolic execution is hard. With every branch, our number of states double. This exponential growth in the number of states can quickly render our search infeasible, which is why it is important for us to limit the number of potential paths as much as possible. Doing this automatically is currently an area of active research. There is a great paper from CMU titled Enhancing Symbolic Execution with Veritesting, which introduces the idea of veritesting in order to mitigate state explosion. This is achieved by combining both static and dynamic symbolic execution to produce heuristics to avoid states which are likely to lead to failure, and by combining branches. Angr does support veritesting in its simulation manager, but this is out of scope for this series. Do read the paper if you are interested for more!

In order to prevent that, we will inject our own symbolic memory for the input instead, and bypass the read_line function entirely.

If we look at the disassembly for main, the place where we call phase_1 is a good point to start program execution. We can then simply pass the symbolic input in the rdi register!

1
2
3
4
5
6
7
8
    # Tell Angr where to start executing 
    start_addr = 0x00400e3a
    initial_state = project.factory.blank_state(addr=start_addr)

    fake_addr = 0x40000000
    phase_1_input = claripy.BVS('phase_1_input', 100 * 8)
    initial_state.memory.store(fake_addr, phase_1_input)
    initial_state.regs.rdi = fake_addr

claripy.BVS creates a symbolic value referred to by phase_1_input, with a size of 100 bytes or 100 * 8 bits. 100 bytes should be big enough, if Angr is unable to find a solution, we can always add more. BVS stands for bit vector symbolic. It has a cousin, BVV, which stands for bit vector value, that holds concrete values. Our string input value is stored at fake_addr. fake_addr was chosen arbitrarily, it can simply be any area of memory not being used by any of the other sections (text, heap, stack, libs) during execution. So with this, we have now a reference to our symbolic bit vector in rdi!

Now, if we look back at the disassembly for phase_1, we see that it primarily calls strings_not_equal. We have a hunch about what it does, let’s look at it in assembly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
gef  disas strings_not_equal
Dump of assembler code for function strings_not_equal:
   0x0000000000401338 <+0>:	push   r12
   0x000000000040133a <+2>:	push   rbp
   0x000000000040133b <+3>:	push   rbx
   0x000000000040133c <+4>:	mov    rbx,rdi
   0x000000000040133f <+7>:	mov    rbp,rsi
   0x0000000000401342 <+10>:	call   0x40131b <string_length>
   0x0000000000401347 <+15>:	mov    r12d,eax
   0x000000000040134a <+18>:	mov    rdi,rbp
   0x000000000040134d <+21>:	call   0x40131b <string_length>
   0x0000000000401352 <+26>:	mov    edx,0x1
   0x0000000000401357 <+31>:	cmp    r12d,eax
   0x000000000040135a <+34>:	jne    0x40139b <strings_not_equal+99>
   0x000000000040135c <+36>:	movzx  eax,BYTE PTR [rbx]
   0x000000000040135f <+39>:	test   al,al
   0x0000000000401361 <+41>:	je     0x401388 <strings_not_equal+80>
   0x0000000000401363 <+43>:	cmp    al,BYTE PTR [rbp+0x0]
   0x0000000000401366 <+46>:	je     0x401372 <strings_not_equal+58>
   0x0000000000401368 <+48>:	jmp    0x40138f <strings_not_equal+87>
   0x000000000040136a <+50>:	cmp    al,BYTE PTR [rbp+0x0]
   0x000000000040136d <+53>:	nop    DWORD PTR [rax]
   0x0000000000401370 <+56>:	jne    0x401396 <strings_not_equal+94>
   0x0000000000401372 <+58>:	add    rbx,0x1
   0x0000000000401376 <+62>:	add    rbp,0x1
   0x000000000040137a <+66>:	movzx  eax,BYTE PTR [rbx]
   0x000000000040137d <+69>:	test   al,al
   0x000000000040137f <+71>:	jne    0x40136a <strings_not_equal+50>
   0x0000000000401381 <+73>:	mov    edx,0x0
   0x0000000000401386 <+78>:	jmp    0x40139b <strings_not_equal+99>
   0x0000000000401388 <+80>:	mov    edx,0x0
   0x000000000040138d <+85>:	jmp    0x40139b <strings_not_equal+99>
   0x000000000040138f <+87>:	mov    edx,0x1
   0x0000000000401394 <+92>:	jmp    0x40139b <strings_not_equal+99>
   0x0000000000401396 <+94>:	mov    edx,0x1
   0x000000000040139b <+99>:	mov    eax,edx
   0x000000000040139d <+101>:	pop    rbx
   0x000000000040139e <+102>:	pop    rbp
   0x000000000040139f <+103>:	pop    r12
   0x00000000004013a1 <+105>:	ret    
End of assembler dump.

Again, we see lots of comparisons and jumps, which is what we don’t like since it can easily lead to state explosion. We can solve this by using a hook, which is a way of overwriting instructions to call our own instructions instead. Angr provides an easy way to do this via what is known as a SimProcedure, which allows you to replace a function with one that you write in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
    class ReplacementStringsNotEqual(angr.SimProcedure):
        def run(self, string_1_address, string_2_address):
            # Load 100 bytes from string_1_address to string_1
            string_1 = self.state.memory.load(
                string_1_address,
                100
            )

            # Load 100 bytes from string_2_address to string_2
            string_2 = self.state.memory.load(
                string_2_address,
                100
            )

            return claripy.If(
                string_1 == string_2,
                claripy.BVV(0, 32),
                claripy.BVV(1, 32)
            )

    strings_not_equal_symbol = 'strings_not_equal'
    project.hook_symbol(strings_not_equal_symbol, ReplacementStringsNotEqual())

In this code snippet, we replace all function calls to 'strings_not_equal' to our own function defined in ReplacementStringsNotEqual. We can use project.hook_symbol because the binary includes debugging symbols, which are debugging information that was not stripped by the compiler after compilation. While it makes the binary larger, it is useful as it makes debugging with a debugger easier. However, if you encounter stripped binaries, then you would need to hook by the address of the function instead.

Recall that strings_not_equal takes in two string pointers as input, and therefore we need to load it into memory first. state.memory.load(addr, size) allows us to load size number of bytes from address to a variable. We can then compare the two input strings, and return 0 or 1 accordingly. We then return a concrete 32 bit integer value of either 0 or 1, depending on the results of the comparison. Note that we cannot directly return a Python integer, because Python integers do not have sizes.

We are now done with the basic set-up. Add the following lines to create a simulation manager, which will help to explore our states later on:

1
2
    # Create a simulation manager initialized with the starting state
    simulation = project.factory.simgr(initial_state)

Find and Avoid

Now we are faced with the question of how to tell Angr what we want it to do.

1
2
3
4
5
6
7
8
9
    # Defines when we have hit a successful state 
    def is_successful(state):
        # Dump whatever has been printed out by the binary so far into a string.
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        return b"Phase 1 defused" in stdout_output

    explode_addr = 0x0040143a # explode_bomb

    simulation.explore(find=is_successful, avoid=explode_addr)

The above introduces a few new concepts. Let’s start from the end - simulation.explore tells Angr to explore states starting from our initial state, and to terminate whenever it has found something that matches the find condition, and also to stop exploring states that have fulfilled the avoid condition. Our find=is_successful condition is what happens when we clear the stage. We see in the disassembly of main that after phase_defused is called after phase_1, we call puts with 0x4023a8, which corresponds to:

1
2
gef  x/s 0x4023a8
0x4023a8:	"Phase 1 defused. How about the next one?"

So what is_successful does is that if Angr ever sees “Phase 1 defused” from stdout of the program, it will know that it has succeeded.

On the flipside, explode_addr is set to the address of explode_bomb, which we can see in the disassembly of Phase 1:

1
2
3
4
5
6
7
8
9
10
11
gef  disas phase_1
Dump of assembler code for function phase_1:
   0x0000000000400ee0 <+0>:	sub    rsp,0x8
   0x0000000000400ee4 <+4>:	mov    esi,0x402400
   0x0000000000400ee9 <+9>:	call   0x401338 <strings_not_equal>
   0x0000000000400eee <+14>:	test   eax,eax
   0x0000000000400ef0 <+16>:	je     0x400ef7 <phase_1+23>
   0x0000000000400ef2 <+18>:	call   0x40143a <explode_bomb>
   0x0000000000400ef7 <+23>:	add    rsp,0x8
   0x0000000000400efb <+27>:	ret    
End of assembler dump.

This tells Angr to avoid calling explode_bomb and truncate such states from its search.

Finally, we check if we are able to find a solution:

1
2
3
4
5
6
7
8
9
    # Check that we have found a solution
    if simulation.found:
        solution_state = simulation.found[0]

        # Case symbolic value to bytes
        solution = solution_state.se.eval(phase_1_input, cast_to=bytes)
        print(solution)
    else:
        raise Exception('Could not find the solution')

Here, if a solution was found, we convert our symbolic phase_1_input value into bytes and print it, and if not, an exception is raised.

Full Solution Script

The full solution script for Phase 1 is below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import angr
import claripy
import sys

def phase_1(argv):
    # Create an Angr project.
    path_to_binary = argv[1] # :string
    project = angr.Project(path_to_binary)

    # Tell Angr where to start executing 
    start_addr = 0x00400e3a
    initial_state = project.factory.blank_state(addr=start_addr)

    fake_addr = 0x40000000
    phase_1_input = claripy.BVS('phase_1_input', 100 * 8)
    initial_state.memory.store(fake_addr, phase_1_input)
    initial_state.regs.rdi = fake_addr

    class ReplacementStringsNotEqual(angr.SimProcedure):
        def run(self, string_1_address, string_2_address):
            # Load 100 bytes from string_1_address to string_1
            string_1 = self.state.memory.load(
                string_1_address,
                100
            )

            # Load 100 bytes from string_2_address to string_2
            string_2 = self.state.memory.load(
                string_2_address,
                100
            )

            return claripy.If(
                string_1 == string_2,
                claripy.BVV(0, 32),
                claripy.BVV(1, 32)
            )

    strings_not_equal_symbol = 'strings_not_equal'
    project.hook_symbol(strings_not_equal_symbol, ReplacementStringsNotEqual())

    # Create a simulation manager initialized with the starting state
    simulation = project.factory.simgr(initial_state)

    # Defines when we have hit a successful state 
    def is_successful(state):
        # Dump whatever has been printed out by the binary so far into a string.
        stdout_output = state.posix.dumps(sys.stdout.fileno())

        return b"Phase 1 defused" in stdout_output # :boolean

    explode_addr = 0x0040143a # explode_bomb

    simulation.explore(find=is_successful, avoid=explode_addr)

    # Check that we have found a solution
    if simulation.found:
        solution_state = simulation.found[0]

        # Case symbolic value to bytes
        solution = solution_state.se.eval(phase_1_input, cast_to=bytes)
        print(solution)
    else:
        raise Exception('Could not find the solution')

if __name__ == '__main__':
    phase_1(sys.argv)

And if we run it, we get the following output:

1
2
3
4
5
6
7
8
9
$ python solve.py bomb
WARNING | 2020-07-30 21:27:56,670 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.
WARNING | 2020-07-30 21:27:56,670 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2020-07-30 21:27:56,670 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial state
WARNING | 2020-07-30 21:27:56,671 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2020-07-30 21:27:56,671 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.
WARNING | 2020-07-30 21:27:56,671 | angr.state_plugins.symbolic_memory | Filling register 20 with 4 unconstrained bytes referenced from 0x400eee (phase_1+0xe in bomb (0x400eee))
CRITICAL | 2020-07-30 21:27:56,752 | angr.sim_state | The name state.se is deprecated; please use state.solver.
b"Border relations with Canada have never been better.\x00\x00\x00\x00Wow! You've defused the secret stage!\x00flyers"

Because we specified 100 bytes for our symbolic input, we got 100 bytes back. Since strings are null terminated, our correct input for phase 1 is therefore “Border relations with Canada have never been better.”. We verify that it works when run with the actual binary:

1
2
3
4
5
$ ./bomb
Welcome to my fiendish little bomb. You have 6 phases with
which to blow yourself up. Have a nice day!
Border relations with Canada have never been better.
Phase 1 defused. How about the next one?

Awesome! I hope you have enjoyed the first part of this series and that it was helpful to you :). You can continue on to the second part here.




    Related Posts:

  • Bounding Mixing Times of Markov Chains via the Spectral Gap
  • Notes on 'The Llama 3 Herd of Models'
  • Playing Sound Voltex at Home: Setting Up Unnamed SDVX Clone with the Yuancon SDVX Controller
  • Creating Trackback Requests for Static Sites
  • A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers: A Guided Walkthrough