Rust - Exploring the Assembly Code between Mutable and Shadow Variables

May 10, 2022

Report a bug

Recently I have started to learn Rust through a program called Summer of Rust led by a Computer Science student at Carleton University. While I had plans to explore C++ and assembly for the summer, I thought it would be a good time to learn Rust instead with a group of students since learning anything by yourself requires a lot of effort.

A student asked in the Discord chat about what would be more efficient, using mutable variables or shadow variables. While the two have different purposes, it was an interesting question. Mutable variables allows a programmer to mutate the state (i.e. value) that we take forgranted in many other non-functional langauges such as C and Python. For instance, in C you could do the following:

int i = 0;
i += 10;

But in Rust, you must have mut keyword when initalizing or declaring the variable to mutate (i.e. change) the value of a variable or else you’ll get the following error:

error[E0384]: cannot assign twice to immutable variable `x`
 --> main.rs:3:5
  |
2 |     let x = 1;
  |         -
  |         |
  |         first assignment to `x`
  |         help: consider making this binding mutable: `mut x`
3 |     x = 2;
  |     ^^^^^ cannot assign twice to immutable variable

error: aborting due to previous error; 1 warning emitted

For more information about this error, try `rustc --explain E0384`.

Meanwhile, shadowing allows you to reuse the name of the variable in a different scope. For instance, this is legal and possible in C:

int sum = 10;                                                                 
printf("the value of sum (%p) is %d\n", &sum, sum);                           
{                                                                             
  int sum = 0;                                                                
  printf("Inside Innerscope\n");                                              
  printf("the value of sum (%p) is %d\n", &sum, sum);                         
}                                                                             
printf("Exited innerscope\n");                                                
printf("the value of sum (%p) is %d\n", &sum, sum); 

Output:

the value of sum (0x7ffe92723a0c) is 10  
Inside Innerscope                                                               
the value of sum (0x7ffe92723a08) is 0                                         
Exited innerscope                                                               
the value of sum (0x7ffe92723a0c) is 10

As you can see from the output above, when you shadow a variable, the new variable with the same name is effectively a new variable (which explains why the address of the variable is different).

However, Rust offers the ability to shadow a variable within the same scope as seen in the Rust Programming Language Book:

fn main() {
    let x = 5;

    let x = x + 1;

    {
        let x = x * 2;
        println!("The value of x in the inner scope is: {}", x);
    }

    println!("The value of x is: {}", x);
}

where you can see the variable x is shadowed twice (i.e. 3 variables with the name x).

Mutable v.s Shadowing - Initial Thoughts

Let’s look at the following two versions that achieves the same output:

mutable version:

fn test()->i32 {
    let mut x: i32 = 1;
    x = x + 4;

    let mut y: i32 = 0;
    y = y + 10;
    x + y
}

shadow version:

fn test()->i32 {
    let x: i32 = 1;
    let x = x + 4;

    let mut y: i32 = 0;
    y = y + 10;
    x + y
}

Without looking the underlying assembly code, I could only make assumptions on the performance. My first thought was that the mutable version is faster because when CPU sets the value of x, the variable would be stored in a register. But with shadowing, an entirely new variable is created so not only would it consume more memory, the CPU would need to fetch from memory for the value if the program was more complex (I assume the value would be stored in a register for this simple example).

While it was agreed that readable code is more important than trying to optimize our code, it was still an interesting question (although this example was very simple and small so performance did not matter anyways). Generally, optimization should not be of a concern for most programmers and if one really wanted to optimize their code, they should first profile their code and consider to either change their algorithm or the data structure they were using. Modern compilers are great at optimizating code so the need for the average programmer to optimize their code by inspecting the underlying assembly code is becoming a thing of the past (of course it varies depending on what you are working on).

In addition, I have extremely limited knowledge of assembly, compilers and optimization. So all this speculation I have made in my head could be very wrong. Which is why I’ll be trying to interpret the generated assembly code through Compiler Explorer, a neat tool to inspect the assembly code generated from small code snippets with the best I can with my limited knowledge on assembly.


Inspecting the Assembly Code

The code may be difficult to see (especially on mobile) so I will paste the relevant assembly code below (with annotations):

mutable version:

sub     rsp, 24                   ;increase the size of the stack 
mov     dword ptr [rsp + 16], 1   ;let mut x: i32 = 1
mov     dword ptr [rsp + 16], 5   ;x = x + 4
mov     dword ptr [rsp + 20], 0   ;let mut y: i32 = 0
mov     dword ptr [rsp + 20], 10  ;y = y + 10
mov     eax, dword ptr [rsp + 16] ;set register with the value of x
add     eax, 10                   ;x + y but instead of adding the value of y it adds 10
mov     dword ptr [rsp + 12], eax ;store the result to rsp + 12 temporarily
seto    al                        ;no clue what that does
test    al, 1                     ; 
jne     .LBB0_2                   ; 
mov     eax, dword ptr [rsp + 12] ;restore the value of eax
add     rsp, 24                   ; decrease the stack
ret                               ; 

shadow version:

push    rax                       ;align stack I think
mov     dword ptr [rsp + 4], 0    ;let mut y:i32 = 0 
mov     dword ptr [rsp + 4], 10   ;y = y + 10
mov     eax, 5                    ;set register to equal 5 which is the precomputed value of x = x = 4
add     eax, 10                   ;x + y (except it's not retrieving the value from the stack)
mov     dword ptr [rsp], eax      ;temporarily store the result
seto    al                        ;does something I have no clue
test    al, 1                     ;
jne     .LBB0_2                   ;
mov     eax, dword ptr [rsp]      ;restores the result back to eax register
pop     rcx                       ;probably something to do with alignment like in line 1
ret                               ;

To begin, the first thing that popped to my mind was that the mutable version creates a bigger stack which was somewhat of a surprise.

sub     rsp, 24                                                                 

The reason why I was somewhat surprised about this result is because I would have expected the shadow version to push multiple values into the stack instead. If I was given the assembly code for both versions, I would have mixed the two up.

It would seem that the compiler wants to respect the programmer’s desire to store new values into the variable which is why we see the following:

mov     dword ptr [rsp + 16], 1   ;let mut x: i32 = 1
mov     dword ptr [rsp + 16], 5   ;x = x + 4
mov     dword ptr [rsp + 20], 0   ;let mut y: i32 = 0
mov     dword ptr [rsp + 20], 10  ;y = y + 10

An interesting note is that the compiler precomputes the arithmetric operation. For instance, instead of writing the instruction to add x = x + 4, it instead sets x = 5 instead. But it still respects the initial assignment of x = 1. Another interesting note is that the value of y is stored but not used at all. Instead the compiler loads the value of x into the register eax and then adds 10. This would indeed be more efficient than trying to retrieve from memory the value of y (i.e. rsp+20).

Unlike the mutable version, the shadow version begins and ends (before the return command) with push rax and pop rcx which I assume is to align the stack. So the stack for the shadow version is definitely a lot smaller based on both my guess and from stackoverflow (i.e. Why does this function push RAX to the stack as the first operation?).

The shadow version is making use of registers which was my suspicion would occur if all the optimizations were turned on. As stated earlier, I would have thought the mutable version would make use of the registers more so than the shadow version. But it does not.

mov     dword ptr [rsp + 4], 0    ;let mut y:i32 = 0                            
mov     dword ptr [rsp + 4], 10   ;y = y + 10                    

The lines after the stack alignment is interesting. Since the compiler has free reign reorganizing the code, the initial value of y is set first and then set to 10 in response to the line y = y + 10;. Similarly how in the mutable version the compiler precomputes the value, the same idea occurs in the shadow version. Instead of writing the instruction to add 0 with 10, it simply just stores the end result.

The next two lines makes it obvious (at least from my limited knowledge of assembly) that the shadow version is more efficient:

mov     eax, 5                    ;set register to equal 5 which is the precomputed value of x = x = 4
add     eax, 10                   ;x + y (except it's not retrieving the value from the stack)

Unlike the mutable version which retrieves the value of x from the stack, the shadow version loads 5 into the register and then adds 10. So the shadow version is making full use of the register unlike the mutable version.

The fact that the mutable version loads the value of x to the register made me think of volatile keyword for some odd reason. The fact that it still consulted the value from the stack instead of pushing a constant value 5 into the register like how the shadow version made me think about the connection. However, it proceeds to add 10 instead of consulting the value of y in the stack so that’s where things break down. Perhaps a good topic for me to look at is how the volatile keyword changes the resulting assembly code.


Conclusion

To summarize this long blog post, for the example code I used, it was found that shadow version resulted in smaller use of the stack and makes use of the register more than the mutable version. Meaning mutable seemed to result in both inefficient code in respect to both time and space. I would like to stress that this is the findings based on my examples and may not apply in most cases nor in a general case either.

Twitter, Facebook