I always thought it was not possible to create a “dynamic” array in C without the use of malloc
till
recently where I was introduced to variable length arrays (VLA). Although the use of “dynamic” is
a poor choice, the ability to allocate an array at runtime based on a variable whose value is not known
till runtime came as a shock to me.
Variable length arrays came into C starting from C99 standard (hence not suitable for anyone who
works on legacy systems or in projects that follow C89 standard). To illustrate, here’s an example of
an array whose size is dependent on the variable len
whose value is set to any number passed as an
argument to the program. Hence, the length of the array is not known during compiled time.
int main(int argc, char **argv) {
size_t len = atoi(argv[1]);
int arr[len];
//do stuff
}
If I were to compile this using gcc
using purely c89
standard (note: use -pedantic
option
to turn off gcc extensions because gcc adds vla support by default), I would get the following error:
$ gcc -std=c89 --pedantic /tmp/test.c /tmp/test.c: In function ‘main’: /tmp/test.c:6:9: warning: ISO C90 forbids variable length array ‘arr’ [-Wvla] int arr[len]; ^~~ /tmp/test.c:6:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
However, if I were to run this using c99, I would have no issue at all:
$ gcc -std=c99 --pedantic /tmp/test.c
$ echo $?
0
This behavior stumbled me. It went against my understanding of the language. It would seem that I am not alone in this. Another blogger ayekat was also very puzzled by this behavior:
In my case it was a simple sequence of code that I thought would never compile — and the moment it did, I feared I would not be on good terms with it - ayekat
What are VLA
VLA (Variable Length Arrays) were introduced in C99 as a way to introduce the ability to create arrays whose length is determined at runtime. VLA in C has automatic storage duration on the stack.
The main benefit I can think of why anyone would want to use VLA is for its convenience and readability. For instance, representing matrices becomes simple:
int matrix[n][n];
Unlike the traditional way to allocate memory whose size is determined during
runtime using malloc
, VLA allocates memory in the stack.
According to what I read on the internet, people claim that the overhead with VLA
is less compared to allocating in the heap. Allocating memory in the stack does
not require the OS to find a free contiguous block of memory in the heap.
However, this claim is questionable and maybe true only in certain scenarios.
I would need more time and fiddling to know for certain. I’ll trust in the
words of Linus that VLA isn’t efficient.
VLA has automatic storage duration meaning the scope of a VLA is limited to its local scope. Effectively making VLA only great as a short lived and for small arrays only. You cannot allocate a big array with VLA as the data lives in the stack and can easily be filled up.
VLA is similar to alloca
where both concepts allocate memory in the stack.
As blogger ayekat
mentions on his take on VLA, using VLA is very risky because the behavior of
alloca
is undefined if allocation causes a stack overflow and VLA will behave in a similar manner.
$ man alloca | grep -A 2 "RETURN VALUE" RETURN VALUE The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.
For those who are not familiar, a stack overflow not only breaks the program, it may also introduce a backdoor to exploit your system by feeding it malicious input to overwrite sections of memory it should have no access to (look up buffer overflow).
A nice way to create a n x n matrix addressable as matrix[x][y]
as described
by someone on stackoverflow is
the following:
size_t n;
double (*matrix)[n] = malloc(n * sizeof *matrix);
This is much more preferred than addressing a matrix as matrix[y*n + x]
or
allocating a non-contiguous nxn matrix (i.e. each row points to a different
section in memory) to achieve the same thing but with performance hits
(i.e. cannot utilize cache line and so you would need the CPU to fetch to the
memory multiple times).
Note: VLA and alloca
are different because data created via alloca
are
destroyed when the function is terminated. While VLA is local to its scope.
So if you were to create a VLA within the loop, it can only be used within the
loop itself and nowhere else in the function.
Should I use VLA
My first instinct is to say no. There are a few things wrong about VLA, some of which I stated in the previous section.
- VLA is allocated in the stack -> need to ensure the size is small to avoid buffer overflow
- VLA has automatic storage - > need to ensure array is created in the oldest stack frame as possible to avoid referencing a section of memory that is freed (i.e. want to avoid dangling pointers).
- portability:
- It’s only supported in C99 -> not portable for older systems
- Optional in C11 meaning it’s up to the compiler’s discretion whether to add support or not
- MSVC does not have VLA support since it conforms to ANSI C standard
- VLA aren’t efficient
- VLA has a small runtime overhead to determine the size of the array
- generates more, slower, and fragile code according to Linus Trovalds
unsigned long key[geo->keylen];
(note: skipping some content)
AND USING VLA’S IS ACTIVELY STUPID! It generates much more code, and much slower code (and more fragile code), than just using a fixed key size would have done. - Linus Trovalds
Summary
Variable Length Array (VLA) allows you to create arrays whose size is determined at runtime but at the cost of runtime overhead and security. VLA is only supported in C99 and not mandatory in C11 meaning VLA is not a portable solution. It is best to avoid using VLA.
sidenote: I highly encourage you to read Ayekat’s take on VLA as it explores something interesting when working with 2D arrays.