Printf does not properly justify non-ASCII characters

Bug #1654688 reported by William Andrea
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bash (Ubuntu)
New
Undecided
Unassigned

Bug Description

I have a script that outputs arbitrary Unicode characters in neat columns, but for anything outside the Basic Latin range (i.e. codepoints > 127), the justification is off. For example, both below commands should output a leading space:

$ printf "%2s\n" 'a'
 a
$ printf "%2s\n" 'á'
á

The spacing problem starts between U+7F and U+80. If you try to print two leading spaces, the same problem occurs between U+7FF and U+800.

This affects the binary /usr/bin/printf as well, but I'm not sure where to report a bug for that.

Ubuntu version: 14.04.5
Bash version: 4.3-7ubuntu1.5 (latest)

description: updated
description: updated
Revision history for this message
xhienne (xhienne) wrote :

In my opinion, this is not a bug (and probably not a feature either) but the expected behavior.

bash's printf is a slightly modified version of the underlying printf() function provided by the C standard library. The printf(3) manual explicitly states that the precision is a number of _bytes_, not a number of characters. So this is the expected result with multibyte characters like 'á'.

You might want to switch to a ISO-8859 character set if you want your 'á' character to only take one byte of memory.

Revision history for this message
William Andrea (wjandrea) wrote :

xhienne, that's a good point. My workaround is to switch to Python 3, which treats multi-byte chars the same as single-byte chars.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.