this: the implicit parameter in OOP

February 11, 2025

Report a bug

I was recently reminded that the variable this is an implicit parameter passed to all methods in OOP such as C++. We can observe this by comparing a regular function vs a method belonging to some class:

#include <iostream>

void greet() {
    std::cout << "Hello World\n";
}

class Human {
public:
    void greet() {
        std::cout << "Hello World\n";
    }
};

int main() {
    greet();
    Human human = Human();
    human.greet();
}

Output:

$ g++ test.C
$ ./a.out 
Hello World
Hello World

Furthermore, their resulting mangled names do not indicate that the function/method takes in any arguments:

$ nm a.out  | grep greet
0000000000401126 T _Z5greetv
000000000040115c W _ZN5Human5greetEv

C++ mangles the symbols to handle name resolutions produced by the compiler which can provide more information to the linker. One obvious problem name mangling solves is handling function overloading where the same function identifier can take in different number or different types of parameters. The v suffix in the mangled names indicates that its only parameter is void. This is true, as the title suggests, this is an implicit parameter meaning its a “parameter” that the compiler will pass into the function. However, this can only be observed by inspecting the assembly code. A language that explicitly passes a reference to the object itself is Python where a typical constructor would look like the following:

class Human:
    def __init__(self, name, age):
        self.name =  name
        self.age = age

Anyhow, let’s observe the assembly code. Note: I’ll be only showing the code of interest.

For greet:

Dump of assembler code for function _Z5greetv:
   0x0000000000401126 <+0>:	push   %rbp
   0x0000000000401127 <+1>:	mov    %rsp,%rbp
   0x000000000040112a <+4>:	mov    $0x402280,%esi

For Human::greet:

Dump of assembler code for function _ZN5Human5greetEv:
   0x000000000040115c <+0>:     push   %rbp
   0x000000000040115d <+1>:     mov    %rsp,%rbp
   0x0000000000401160 <+4>:     sub    $0x10,%rsp
   0x0000000000401164 <+8>:     mov    %rdi,-0x8(%rbp)
   0x0000000000401168 <+12>:	mov    $0x402280,%esi

In x86 assembly, whenever you enter a function, the parameters are retrieved from the stack into registers rdi, rsi, rdx, etc (at least that’s how I understood it). Since greet has not parameters, it goes straight to storing the address of our constant string “Hello World\n” into the esi register:

(gdb) x/1s 0x402280
0x402280:	"Hello World\n"

However, for our method Human::greet, rdi register which typically holds the first parameter of the function is being utilized

mov    %rdi,-0x8(%rbp)

We can assume whatever register rdi is holding, it’s an 8B value which also happens to be the size of a pointer in x86-64. This is our implicit argument, this, which contains the address of the object itself. We can observe this via gdb:

(gdb) p &human
$2 = (Human *) 0x7fffffffdc4f
...
(gdb) i r rdi
rdi            0x7fffffffdc4f      140737488346191

where we see that the rdi register contains the same address as our object human: 0x7fffffffdc4f.

We can also replicate this in arm where w0 or x0 will be set with the address of our object human using compiler explorer:

Human::greet():
 stp	x29, x30, [sp, #-32]!
 mov	x29, sp
 str	x0, [sp, #24]
...

As you can observe, x0 also containsi some 8B value from the stack (ie. 32 - 24 = 8). Running this on an QNX ARM image (I was too lazy to flash a new OS onto my Raspberry Pi), we can observe x0 register indeed does contain the same address as our object human which represents this

(gdb) p &human
$1 = (Human *) 0x81c60
...
Dump of assembler code for function _ZN5Human5greetEv:
test.C:
9	    void greet() {
<+0>:	stp	x29, x30, [sp, #-32]!
<+4>:	mov	x29, sp
<+8>:	str	x0, [sp, #24]
...
Dump of assembler code for function _ZN5Human5greetEv:
test.C:
9	    void greet() {
<+0>:	stp	x29, x30, [sp, #-32]!
<+4>:	mov	x29, sp
<+8>:	str	x0, [sp, #24]

(gdb) i r x0                     
x0             0x81c60             531552

Twitter, Facebook