Stack Overflow: The Case of a Small Stack

December 29, 2024

Years ago I was once asked by an intern to debug a mysterious crash that seemed so innocent. While I no longer recall what the code was about, we stripped the program to a single line in main. Yet the program still continued to crash.

Source:

int main() {
    char buf[1024*1024*1024];
}

Result:

# ./prog-arm64 

Process 630803 (prog-arm64) terminated SIGSEGV code=1 fltno=11 ip=00000025333267f0 mapaddr=00000000000007f0 ref=000000443dd5dc50
Memory fault (core dumped) 

This bewildered all of the interns as it made absolutely no sense. Through our investigation, there was two things we noticed:

The program worked on our local machines but not on our target virtual machine
We were allocating an extremely large buffer in the stack which was unusual

It turns out the intern wanted to allocate a 1MiB buffer for some networking or driver related ticket. If I recall correctly, our target only had 512MB RAM so this could have explained the mysterious crash. But even 1MiB buffer on the stack was too large for our target:

int main() {
	char buf[1024*1024];
}

Result:

# ./prog-arm64 

Process 696339 (prog-arm64) terminated SIGSEGV code=1 fltno=11 ip=0000004de7e7a7ec mapaddr=00000000000007ec ref=000000383b19fbe0
Memory fault (core dumped) 

One thing I purposely omitted was that our target was running QNX, a realtime operating system. If we were to take a look at the documentation:

A process’s main thread starts with an automatically allocated 512 KB stack – QNX SDP 8.0 - Stack Allocation

This shocked all of us since 1 MiB is not a large buffer in 2021 where we had plenty of memory on our own personal system at home.

Note 1: The target used in the example was an aarch64le. This example will work on amd64 (x86_64) but requires you to add something such as a print statement

Note 2: QNX 8.0 was released to the general public in late 2023 or early 2024 so the actual target at the time when the question was asked was running either QNX 7.0 or QNX 7.1 (I do not recall which version)

Investigating why AMD64 (x86_64) seems unaffected

Note: Everything below is nothing shocking nor interesting. I just felt like keeping it there.

The behavior for AMD64 (x86_64) as noted requires more fiddling to trigger a crash which came to my surprise. From my understanding of the documentation, the stack size should still be 512KB. Suspecting there could be some optimization going on, I fiddled around with the compiler setting and added some code to see if I could trigger the crash and it turns out that if I make a call to printf, the program will indeed crash as desired.

Source Code:

#include <stdio.h>
int main() {
  char buf[1024*1024];
  printf("Hello World\n");
}

Result:

# ./prog-amd64  

Process 2977812 (prog-amd64) terminated SIGSEGV code=1 fltno=11 ip=0000002c51b107f6 mapaddr=00000000000007f6 ref=0000003f4ece4b58
Memory fault (core dumped) 

To test my hypothesis that there was optimization under the hood, I generated the assembly (i.e. pass -S to qcc):

main:
.LFB0:
        .file 1 "prog.c"
        .loc 1 2 12
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $1048592, %rsp

With much disappointment, my hypothesis was incorrect. We can see that the stack pointer indeed does move at least by 1 MiB (1024 x 1024 = 1048576). As this file was simply incomplete as we still needed to run the assembler and linker to make the program executable, I then proceeded to running the program on the debugger in hopes that I can save my hypothesis (spoiler: my initial hypothesis is false).

(gdb) disassemble
Dump of assembler code for function main:
   0x0000000008048791 <+0>:     push   %rbp
   0x0000000008048792 <+1>:     mov    %rsp,%rbp
   0x0000000008048795 <+4>:     sub    $0x100010,%rsp
   0x000000000804879c <+11>:    mov    0x182d(%rip),%rax        # 0x8049fd0
   0x00000000080487a3 <+18>:    mov    (%rax),%rcx
   0x00000000080487a6 <+21>:    mov    %rcx,-0x8(%rbp)
   0x00000000080487aa <+25>:    xor    %ecx,%ecx
   0x00000000080487ac <+27>:    mov    $0x0,%eax
   0x00000000080487b1 <+32>:    mov    %eax,%edx
   0x00000000080487b3 <+34>:    mov    0x1816(%rip),%rax        # 0x8049fd0
   0x00000000080487ba <+41>:    mov    -0x8(%rbp),%rsi
   0x00000000080487be <+45>:    sub    (%rax),%rsi
   0x00000000080487c1 <+48>:    je     0x80487c8 <main+55>
   0x00000000080487c3 <+50>:    call   0x8048620 <__stack_chk_fail@plt>
=> 0x00000000080487c8 <+55>:    mov    %edx,%eax

As we can see from the assembly above, the stack pointer does move at least by 1MiB so the theory of optimization is definitely ruled out. Going through the program via the debugger using stepi I notice the following:

   0x00000000080487be <+45>:    sub    (%rax),%rsi
   0x00000000080487c1 <+48>:    je     0x80487c8 <main+55>
   0x00000000080487c3 <+50>:    call   0x8048620 <__stack_chk_fail@plt>
=> 0x00000000080487c8 <+55>:    mov    %edx,%eax

The instruction pointer skipped <__stack_chk_fail@plt> which is the the stack guard that is added to mitigate against stack buffer oveflows (whether intentional or not). Essentially, a stack guard inserts some small value known as the canary between the stack variables and the return address. If the return address was overwritten, then the canary value would be overwritten. The way to check whether the canary has been overwritten can be done in either two ways:

canary - original_canary != 0
canary ^ original_canary != 0

If any of the two are evaluated to be true, then the program will jump to the fail function to terminate the program. In our program, it would seem that we did not overwrite register rax which appears to be our canary with the value of 0x8049fd0. I will now attempt to walk through with you what exactly is going on with my limited knowledge in Assembly (I’m going to use the excuse that I am a Mathematics student to excuse my lack of assembly knowledge :D):

For simplicity, I am going to modify the above assembly above to use more friendly notation when making references to addresses and write some pseudocode in C syntax (I’ll be omitting some details so it’s not a one to one replication). From instructions between <+11> to <+21>, we are storing the canary value 8 bytes below the base pointer:

<+11>:    mov    0x182d(%rip),%rax        # 0x8049fd0
<+18>:    mov    (%rax),%rcx
<+21>:    mov    %rcx,-0x8(%rbp)

rax = 0x8049fd0
rcx = rax
*(rbp-8) = rcx

This value is then compared with rax register which is again loaded with the original canary value in the instruction address <+34>. The generated assembly code utilises the 2nd method to check whether a canary value has been overwritten, by subtracting the two canary values:

<+34>:    mov    0x1816(%rip),%rax        # 0x8049fd0
<+41>:    mov    -0x8(%rbp),%rsi
<+45>:    sub    (%rax),%rsi

rax = 0x8049fd0;//store the original canary value into rax (this value will ideally be not modified)
rsi = *(rbp-8); //store our canary value to register rsi (this value could be modified if we have a buffer overflow)
result = rsi - rax

As the canary value was not modified, the result is set to 0. je in iaddress <+48> will skip the next instruction to call __stack_chk_fail@plt (iaddress <+50>).

Note: I did not read into the function __stack_chk_fail@plt so maybe they do more checks to see if the canary failed because it has the name chk into the name

As our program skipped __stack_chk_fail@plt, the program does not crash.

Now let’s take a quick look into why adding a print statement triggers the crash:

=> 0x00000000080487f6 <+37>:    call   0x8048650 <puts@plt>
   0x00000000080487fb <+42>:    mov    $0x0,%eax
   0x0000000008048800 <+47>:    mov    %eax,%edx
   0x0000000008048802 <+49>:    mov    0x17c7(%rip),%rax        # 0x8049fd0
   0x0000000008048809 <+56>:    mov    -0x8(%rbp),%rsi
   0x000000000804880d <+60>:    sub    (%rax),%rsi
   0x0000000008048810 <+63>:    je     0x8048817 <main+70>
   0x0000000008048812 <+65>:    call   0x8048660 <__stack_chk_fail@plt>
   0x0000000008048817 <+70>:    mov    %edx,%eax
   0x0000000008048819 <+72>:    leave
   0x000000000804881a <+73>:    ret
End of assembler dump.
(gdb) stepi

Program received signal SIGSEGV, Segmentation fault.

Immediately we can see that the stack guard is not the reason for the crash but rather a call to puts@plt that triggered the crash. Let’s compare the two instruction registers before the crash is triggered where the first is from a program with a valid buffer size:

(gdb) i r
...
rbp            0x81ce0             0x81ce0
rsp            0x818d0             0x818d0
...

v.s.

(gdb) info r
...
rbp            0x81ce0             0x81ce0
rsp            0xfffffffffff81cd0  0xfffffffffff81cd0
...

Only the stack pointer rsp differs which is to be expected. To understand the crash, we first need to recall the fact that each function has their own stack.

Side Note: Stacks

Feel free to skip this section. This section investigates how the stack grows. All you need to understand is that the new stack frame will be located "after" the callee stack frame. Let's observe the following simple program:

#include <stdio.h>
void foo(char *x, int y) {
    char z[16] = {'W', 'o', 'r', 'l', 'd'};
    printf("%d: %s %y\n", y, x, z);
}
int main() {
    char x[32] = {'H', 'e', 'l', 'l', 'o'};
    int y = 21;
    foo(x, y);
}

Note: When reading the stack, recall which way the stack grows and the endian. In our case, the stack grows downwards starting from a higher address and grows towards the lower addresses. The format is in little endian meaning the least significant bit is placed in the lower address. Before foo is called, this is the state of our base and stack pointers:

(gdb) i r rbp
rbp            0x81ce0             0x81ce0
(gdb) i r rsp
rsp            0x81ca0             0x81ca0

and the addresses of the stack variables:

(gdb) p &x[0]
$2 = 0x81cb0 "Hello"
(gdb) p &y
$3 = (int *) 0x81cac

and the corresponding values in the stack (highlighted the same color as its corresponding addresses):

(gdb) x/6x $sp
0x81ca0:        0x08049dd8      0x00000000      0x08049dd8      0x00000015
0x81cb0:        0x6c6c6548      0x0000006f
...

Reading the Stack

y = 21 corresponds to 0x15 stored in the address 0x81cac

char x[32] = {'H', 'e', 'l', 'l', 'o'};
movabs $0x6f6c6c6548,%rax

The string x starts from 0x81cb0 where H is 0x48 (104 in ASCII) and o is 6f (111 in ASCII)

When the function foo is called, we can observe that the new base stack is "above" (lower address) than the callee main:

Register	main	foo
rbp	0x81ce0	0x81ca0
rsp	0x81c90	0x81c60

Therefore we should observe the stack variables under foo "above" (lower address) than the callee main as well:

(gdb) p &(x[0])
$4 = 0x81cb0 "Hello"
(gdb) p &y
$5 = (int *) 0x81c64
(gdb) p &z[0]
$6 = 0x81c70 "World"
(gdb) x/80x rsp
No symbol "rsp" in current context.
(gdb) x/-21x 0x81cb0+8
0x81c64:        0x00000015      0x00081cb0      0x00000000      0x6c726f57
0x81c74:        0x00000064      0x00000000      0x00000000      0x000e74a0
0x81c84:        0x00000001      0x649d7900      0xd7224120      0x00081ce0
0x81c94:        0x00000000      0x08048897      0x00000000      0x08049dd8
0x81ca4:        0x00000000      0x08049dd8      0x00000015      0x6c6c6548
0x81cb4:        0x0000006f

Any new stack frames will come after rsp (lower addresses in our case), so we can simply try to modify rsp with any random value to trigger a segfault. Considering we cannot even access the variable buf, it will come to no surprise that gdb will prevent us from writing to the memory address:

(gdb) p buf[0]
Cannot access memory at address 0xfffffffffff81cd0
(gdb) p &buf[0]
$4 = 0xfffffffffff81cd0 <error: Cannot access memory at address 0xfffffffffff81cd0>
(gdb) set {int}$rsp=8
Cannot access memory at address 0xfffffffffff81cd0

Honestly, this was very anti-climatic.

Conclusions:

The stack guard was never triggered since we never overwrote our canary value (I mean the program did nothing anyways)
Adding a single print statement was enough to trigger a segmentation fault as the stack pointer will point into an unreachable address for writing

Some Random Notes on GDB

A few notes on working with a remote target with gdb:

Connecting to the target: target qnx <ip-address>:8000
Load the file: file <prog-binary>
Upload binary to the target: upload <file> <full_path_in_remote>

Example:

(gdb) target qnx 192.168.124.207:8000
Remote debugging using 192.168.124.207:8000
MsgNak received - resending
Remote target is little-endian
Disabled 'set detach-on-fork' for remote targets
(gdb) file prog-amd64
Reading symbols from prog-amd64...
(gdb) upload prog-amd64 /tmp/prog-amd64

Some stuff I found out:

(gdb) x/1s 0x8048824
0x8048824:      "hello world"

Share: Twitter, Facebook