Newlib C99 scanf() problem

Bug #1399224 reported by stewo on 2014-12-04
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Terry Guo

Bug Description

There is a problem in Newlib that might cause memory corruption when using C99 modifiers for 8-bit values like SCNu8 in scanf(). I have reported the issue to the Newlib mailing list and one of the maintainers submitted a patch.

Problem in detail:

Trying to read an IP address from a string with the following code snippet:

void read_ipaddr()
  char *str;
  uint8_t a, b, c, d;
  int res;

  // This function returns a pointer to the
  // string that represents the IP address
  get_str( &str );

  res = sscanf( str, "%" SCNu8 ".%" SCNu8 ".%" SCNu8 ".%" SCNu8,
                &a, &b, &c, &d );


The thing is that *str, i.e. the address value that is stored in the *str pointer, seems to be corrupted by sscanf().
Dumping the memory shows that the variables a, b, c and d are located first in memory (stack area growing downwards) - let's say at byte addresses 10, 11, 12 and 13. After that, the 4-byte pointer *str is placed at address 14 to 17.
Before the call of sscanf(), *str has a value of 0xa14af968. After the call to sscanf(), the value is 0xa14af900, meaning that the least
significant byte of the value, which is located at byte address 14 (little endian), is overwritten by sscanf(). The bytes at addresses 10 to 13 represent the correct numbers of the scanned IP address string.

Explanation by Jeff Johnston on Newlib's mailing list:

The vfscanf.c code only supports hh if the flag _WANT_IO_C99_FORMATS is set to true. By default it is false in newlib/ and for a few select platforms, it is set to true.

If you did not configure your library with --enable-newlib-io-c99-formats, then it is false as arm doesn't set it to true.

Now, the second part of the problem. The code in vfscanf.c sees the first 'h' and sets a flag to indicate a short value and then reads the next format character since h is a modifier for a format. It reads the second 'h' and processes it the same as it did for the first (sets the short value flag again to true).

So, the code thinks you are reading a short value and this sets the upper-byte to 0x00. So, each of a, b, c, d are stepped on in order (ok, since they are being read in that order) and the final read of d steps on the next byte after it on the stack, which in your case is the str pointer.


As mentioned above, the problem should be fixed now on the Newlib source repository. There is also a patch available (see attached file) for Newlib's inttypes.h. Maybe Newlib should be built with --enable-newlib-io-c99-formats?

stewo (wolfer-y) wrote :
Terry Guo (terry.guo) on 2014-12-05
Changed in gcc-arm-embedded:
assignee: nobody → Terry Guo (terry.guo)
Terry Guo (terry.guo) wrote :

Thanks for reporting the issue. Confirmed this issue and the fix is already in Newlib upstream. We can add Newlib configuration option -enable-newlib-io-c99-formats to support this feature. However the side effect is the increased code size for below app:

terguo01@terry-pc01:case$ cat m.c

int a, b, c, d;

main ()
  char *p = "11 22 33 44";

  sscanf(p, "%d %d %d %d", &a, &b, &c, &d);

  printf ("%d/%d/%d/%d\n", a, b, c, d);

  return 0;

My small experiment shows code size is increased from 42840 bytes to 45824 bytes, about 7% increment for cortex-m3.

Changed in gcc-arm-embedded:
status: New → Confirmed
Seppe Stas (seppestas) wrote :

Found this out the hard way in

Having a documentation page for scanf, printf, ... functions dedicated to embedded systems warning about these limitations would be pretty nice to have. does not seem to mention anything.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.