Man pages show wrong Unicode characters instead of ASCII

Bug #272290 reported by Egmont Koblinger
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
groff (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: groff-base

The "man" command displays man pages with cool-looking Unicode quotation marks, hyphens and more. This very often leads to incorrect content, when the page actually tries to explain the meaning of an ASCII symbol.

Few examples, starting with manual page name, continued with line number (assuming 80 columns) and description:

bash L1428 and many more places: ‘command‘ (U+2018) instead of `command` (U+0060) demonstrates command substitution.

bash L274 and many other places use │ (U+2502) instead of the standard pipe symbol: | .

gawk L605, L608: \‘ instead of \`, \’ instead of \' as possible escapes for regexps. L1348 and others use ’ (U+2019) instead of ' making the examples wrong.

links L33 talks about the ‐‐enable‐graphic option to ./configure, I'm pretty sure the configure script wouldn't understand those U+2010 dashes.

There are *lot* more man pages suffering from these kinds of problems.

I haven't checked the specification of man pages' format, I don't know whether these particular man pages are buggy, or the rendering software. Oh, by the way, this one is my favorite:

groff L503 yet again uses ‘ (U+2018) instead of the old-fashioned backtick. This means that groff itself fails to properly render its own manual page. Sigh...

These bugs make these manual pages
- incorrect;
- misleading;
- not suitable for copy-pasting;
- not searchable for these particular special characters;
- even more incorrect if the terminal has limited font displaying capabilities (such as the Linux console with a font that completely lacks these Unicode symbols).

One of the possible solution would be to fix all these manpages (my guess is that there are some hundreds of these). I don't think this approach is feasible.

Another possible solution is to patch groff to be less eager to use Unicode stuff. We've chosen this approach in the distribution I used to be a maintainer of, and we've come up with this patch, which you might want to consider applying:
https://svn.uhulinux.hu/packages/2.1/groff/patches/02-sane-ascii-characters.patch

Note that there's one more problem with the handling all these UTF-8 stuff: If one of these symbols is bold or underlined, and you redirect the output of "man" into a file, then you get some garbage (invalid UTF-8) there instead of the simple non-highlighted version.

Don't get me wrong: I'm a great fan of proper typesetting as well as Unicode and always try to use the proper quotation marks, proper hyphens and so all. I just think that there are places when this is not so necessary. Manual pages formatted in terminals are usually for slightly more power users, not for those who only use some fancy graphical apps. Here getting the quote marks and hyphens typographically incorrect is not such a big issue, it's much more important that the characters displayed are actually those the man pages are talking about. UI strings of Gnome, KDE, OpenOffice.org and so on are proper places to all these fancy Unicode characters—but I just think they are shamelessly not used properly there, I wonder why... For manual pages they are simply not important at all IMHO.

I'm using Hardy 8.04.1, including groff-base 1.18.1.1-16 and man-db 2.5.1-3.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in groff (Ubuntu):
status: New → Confirmed
Revision history for this message
Kasper Dupont (ubuntu-launchpad-feb) wrote :

I just ran into this problem on Ubuntu 14.04. I needed to write ASCII symbol \x27 in a file. Not remembering exactly which symbol that is, I used the ascii man page and copied the symbol listed as hexadecimal 27 to the target file. It turns out that does not work because the symbol shown in the ascii man page as number 27 is not an ascii symbol at all.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers