I was recently reminded that the variable this
is an implicit parameter passed to all methods in OOP such as C++. We can observe this by comparing a regular function vs a method
belonging to some class:
#include <iostream>
void greet() {
std::cout << "Hello World\n";
}
class Human {
public:
void greet() {
std::cout << "Hello World\n";
}
};
int main() {
greet();
Human human = Human();
human.greet();
}
Output:
$ g++ test.C
$ ./a.out
Hello World
Hello World
Furthermore, their resulting mangled names do not indicate that the function/method takes in any arguments:
$ nm a.out | grep greet
0000000000401126 T _Z5greetv
000000000040115c W _ZN5Human5greetEv
C++ mangles the symbols to handle name resolutions produced by the compiler which can provide more information to the linker. One obvious
problem name mangling solves is handling function overloading where the same function identifier can take in different number or different types of parameters.
The v
suffix in the mangled names indicates that its only parameter is void
. This is true, as the title suggests, this
is an implicit parameter
meaning its a “parameter” that the compiler will pass into the function. However, this can only be observed by inspecting the assembly code. A language
that explicitly passes a reference to the object itself is Python where a typical constructor would look like the following:
class Human:
def __init__(self, name, age):
self.name = name
self.age = age
Anyhow, let’s observe the assembly code. Note: I’ll be only showing the code of interest.
For greet
:
Dump of assembler code for function _Z5greetv:
0x0000000000401126 <+0>: push %rbp
0x0000000000401127 <+1>: mov %rsp,%rbp
0x000000000040112a <+4>: mov $0x402280,%esi
For Human::greet
:
Dump of assembler code for function _ZN5Human5greetEv:
0x000000000040115c <+0>: push %rbp
0x000000000040115d <+1>: mov %rsp,%rbp
0x0000000000401160 <+4>: sub $0x10,%rsp
0x0000000000401164 <+8>: mov %rdi,-0x8(%rbp)
0x0000000000401168 <+12>: mov $0x402280,%esi
In x86 assembly, whenever you enter a function, the parameters are retrieved from the stack into registers rdi, rsi, rdx, etc (at least that’s how I understood it).
Since greet
has not parameters, it goes straight to storing the address of our constant string “Hello World\n” into the esi register:
(gdb) x/1s 0x402280
0x402280: "Hello World\n"
However, for our method Human::greet
, rdi
register which typically holds the first parameter of the function is being utilized
mov %rdi,-0x8(%rbp)
We can assume whatever register rdi
is holding, it’s an 8B value which also happens to be the size of a pointer in x86-64.
This is our implicit argument, this
, which contains the address of the object itself. We can observe this via gdb:
(gdb) p &human
$2 = (Human *) 0x7fffffffdc4f
...
(gdb) i r rdi
rdi 0x7fffffffdc4f 140737488346191
where we see that the rdi
register contains the same address as our object human
: 0x7fffffffdc4f
.
We can also replicate this in arm
where w0
or x0
will be set with the address of our object human
using compiler explorer:
Human::greet():
stp x29, x30, [sp, #-32]!
mov x29, sp
str x0, [sp, #24]
...
As you can observe, x0
also containsi some 8B value from the stack (ie. 32 - 24 = 8). Running this on an QNX ARM image (I was too lazy to flash a new OS onto my Raspberry Pi),
we can observe x0
register indeed does contain the same address as our object human
which represents this
(gdb) p &human
$1 = (Human *) 0x81c60
...
Dump of assembler code for function _ZN5Human5greetEv:
test.C:
9 void greet() {
<+0>: stp x29, x30, [sp, #-32]!
<+4>: mov x29, sp
<+8>: str x0, [sp, #24]
...
Dump of assembler code for function _ZN5Human5greetEv:
test.C:
9 void greet() {
<+0>: stp x29, x30, [sp, #-32]!
<+4>: mov x29, sp
<+8>: str x0, [sp, #24]
(gdb) i r x0
x0 0x81c60 531552