Another Look at scanf - scanset

October 15, 2020

Report a bug

It came to my attention from an embedded course on udemy the existence of scanset in scanf

As most of you would know, scanf and its variants are functions that scans from the input according to some format you specify such as %d for an integer or %c for a char. However, did you know that scanf has the flexibility to whitelist or blacklist characters? This can be done by specifying a scanset in the format string. Based on the name scanset itself, it implies that scanset is a set of characters you provide for scanf to scan for. To be specific, scanset is a specifier where you can specify a character or a set of characters to either accept or reject. To specify a scanset, you encase the set of characters inside square brackets. Note: the set is case sensitive

It may be hard to understand without an example, so here’s an example of a using scanf to only accept the character a only:

scanf("%[a]", buffer);

If you wish to stop reading the input upon reading some character in a set, you can specify your scanset with ^. For instance, you can have scanf stop reading after the first occurrence of a period (.) to extract the first sentence of an input with the following call:

scanf(%[^.], buffer);

Benefits

To put it simply, this saves you the trouble of writing code to process from the input the desired string you wish to look at. To extract information from a string not delimited by whitespace such as CSV, you would have to look through strtok. However, with scanset, you can extract the data from a CSV with one scanf call. For instance, let’s say we want to parse a CSV with the format first_name, last_name, age. The code to scan a CSV would look something like this:

scanf("%[^,],%[^,],%d\n", fname, lname, &age);

See how convenient that was. Just one line of code is needed to parse a line of data from a CSV file. All we need to do is add a loop and we parse the entire csv file:

while(scanf("%[^,],%[^,],%d\n", fname, lname, &age) == 3) {
    printf("========================\n");
    printf("First Name: %s\n", fname);
    printf("Last Name : %s\n", lname);
    printf("age       : %d\n", age);
}

output:

$ ./a.out < data.csv 
========================
First Name: John
Last Name : Smith
age       : 20
========================
First Name: Mary
Last Name : Lee
age       : 21
========================
First Name: Marco
Last Name : Polo
age       : 19

Similarly to character class in Regex, you can specify ranges as well. For instance, %[A-Z] matches uppercase characters only, while %[0-5] matches a string with digits that fall within the range 0-5 (inclusive).

Anyhow, that was all I wanted to share for today. I just thought it was neat that scanf can do more than what I initially thought.

Twitter, Facebook