cowsay miscalculates length of multibyte-UTF-8-characters

Bug #393212 reported by Simon Dierl
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cowsay (Debian)
Fix Released
Unknown
cowsay (Ubuntu)
Fix Released
Low
François Marier
Nominated for Karmic by Andrew Marsh
Nominated for Lucid by Andrew Marsh

Bug Description

Binary package hint: cowsay

Ubuntu 9.04
cowsay 3.03

When piping text containing multibyte-characters to cowsay (echo "äöü" | cowsay), cowsay calculates line length based on the char* size, not the UTF-8-string size, resulting in this:

$ echo "ää" | cowsay
 ______
< ää >
 ------
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||
$ echo "aa" | cowsay
 ____
< aa >
 ----
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||

Note that the top bubble's size is off by 2 characters.

The length of lines must be determined using UTF-8 functions, not basic array functions to determine the screen real estate used.

Revision history for this message
In , Anthony DeRobertis (asd-suespammers) wrote : Also doesn't calculate the width of unicode strings right for dialogue bubble

Package: cowsay
Version: 3.03-6
Followup-For: Bug #254557

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

anthony@bohr:~$ cowsay '私のズボンは火事だ'
 _____________________________
< 私のズボンは火事だ >
 -----------------------------
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||

(No idea what the Japanese means, btw.)

Notice how the > does not line up properly. The bubble is too wide. In
case the unicode breaks, these two lines are the same width:

    mmmmmmmmmmmmmmmmmm
    私のズボンは火事だ

- -- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing'), (130, 'unstable'), (120, 'experimental')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.10-bohr
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages cowsay depends on:
ii perl [perl5] 5.8.4-8 Larry Wall's Practical Extraction

- -- no debconf information

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCrpkZ+z+IwlXqWf4RAka0AJ9mBgtRKSk2eIjlxeWhOOtYHsI8mgCgjHlC
qwooBE4eXb2HCqWjhlgTNDs=
=QqgK
-----END PGP SIGNATURE-----

Revision history for this message
In , Florian Ernst (florian-uni-hd) wrote :

On Tue, 14 Jun 2005 04:45:13 -0400, Anthony DeRobertis wrote:
> anthony@bohr:~$ cowsay '?????????'
> _____________________________
> < ????????? >
> -----------------------------
> \ ^__^
> \ (oo)\_______
> (__)\ )\/\
> ||----w |
> || ||
>
> (No idea what the Japanese means, btw.)

Just in case nobody has answered this yet: "My pants are on fire",
where "zubon" normally refers to a "men's formal divided skirt".

Cheers,
Flo

Revision history for this message
In , Florian Ernst (florian-uni-hd) wrote : Housekeeping...

tags 240186 upstream
tags 281347 upstream
tags 294792 upstream
tags 328549 upstream
tags 336809 upstream
forwarded 240186 Peter Eisenlohr <email address hidden>
forwarded 281347 Peter Eisenlohr <email address hidden>
forwarded 294792 Peter Eisenlohr <email address hidden>
forwarded 328549 Peter Eisenlohr <email address hidden>
forwarded 336809 Peter Eisenlohr <email address hidden>
tags 285379 upstream
forwarded 285379 Kathryn Andersen <perlkat AT katspace dot com>
tags 254557 upstream
forwarded 254557 Tony Monroe <tmonroe plus perl at nog dot net>
thanks

Revision history for this message
In , jetxee (jetxee) wrote : cowsay: Length of message is wrong for UTF-8 strings

Package: cowsay
Version: 3.03-9
Followup-For: Bug #254557

I noticed, that the lenght of the message is calculated erroneously
if the message contains two-byte symbols from UTF-8 encoding.

This leads, in particular, to broken balloons for such strings.

For example:
$ cowsay Hello, world
 ______________
< Hello, world >
 --------------
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||
$ cowsay 'Привет, мир!'
 _______________________
< Привет, мир! >
 -----------------------
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||

-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (990, 'stable'), (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.22-2-686 (SMP w/1 CPU core)
Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages cowsay depends on:
ii perl 5.8.8-11.1 Larry Wall's Practical Extraction

cowsay recommends no packages.

-- no debconf information

Revision history for this message
In , Damyan Ivanov (dmn-debian) wrote : [patch] cowsay: Doesn't properly support UTF-8

severity 254557 important
tags 254557 patch
thanks

[raising severity as Unicode support is, well, important :)]

Attached is a patch that turns on perl's utf-8 layer in STDIN, STDOUT
and @ARGV *iff* the locale is utf-8 enabled.

The ${^UTF8LOCALE} is available since perl 5.8.7 (etch has 5.8.8)

Thanks for considering.

--
dam JabberID: <email address hidden>

Revision history for this message
Simon Dierl (simon.dierl) wrote :

Binary package hint: cowsay

Ubuntu 9.04
cowsay 3.03

When piping text containing multibyte-characters to cowsay (echo "äöü" | cowsay), cowsay calculates line length based on the char* size, not the UTF-8-string size, resulting in this:

$ echo "ää" | cowsay
 ______
< ää >
 ------
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||
$ echo "aa" | cowsay
 ____
< aa >
 ----
        \ ^__^
         \ (oo)\_______
            (__)\ )\/\
                ||----w |
                || ||

Note that the top bubble's size is off by 2 characters.

The length of lines must be determined using UTF-8 functions, not basic array functions to determine the screen real estate used.

Revision history for this message
Simon Dierl (simon.dierl) wrote :

The weird look of the cows is apparently a result of the copying and not a bug. The buggy speech bubbles were not affected by this error.

Revision history for this message
In , Martin (debacle) wrote : Patch does not work correctly

Damyan, your patch does work very well for cases like:
./cowsay "MÖÖÖ"
  ______
< MÖÖÖ >
  ------

For other cases it does not work correctly for me. I have
LANG=en_DK.UTF-8 and while 3.03-9.2 in a GNOME Terminal
2.26.2 prints:

$ /usr/games/cowsay "我愛中國人"
  _________________
< 我愛中國人 >
  -----------------

Your patch changes the output to:
$ ./cowsay "我愛中國人"
  _______
< 我愛中國人 >
  -------

Which doesn't look right neither. The number of line characters
(7 instead of 17) seems to be correct, but the Chinese characters
are displayed wider, so one would need ~11 line characters.

Revision history for this message
Andrew Marsh (andrewmarsh01) wrote :

I have created a fix to this problem and shll upload it as soon as I work out how to do it. Basically the bug is that perl does not know that the input is supposed to be a utf-8 string. By adding use Encode to the top of the file and by using Encode::decode_utf8 before checking the length of the string the problem is fixed.

Unfortunately, this will still not work with all unicode characters as some, for instance japanese characters, are not the same width as other characters in the default terminal font.

Changed in cowsay (Ubuntu):
status: New → Fix Committed
tags: added: patch
Changed in cowsay (Ubuntu):
status: Fix Committed → In Progress
Revision history for this message
Brian Murray (brian-murray) wrote :

The patch attached is an incomplete solution and so is the one in the Debian bug report, however it might be worth including in package as it fixes part of the problem.

Changed in cowsay (Ubuntu):
importance: Undecided → Low
Revision history for this message
Benjamin Drung (bdrung) wrote :

There is no debdiff for sponsoring. Therefore this bug is a task for ubuntu-reviewers instead of ubuntu-universe-sponsors.

tags: added: patch-needswork
removed: patch
Changed in cowsay (Debian):
status: Unknown → Confirmed
Revision history for this message
François Marier (fmarier) wrote :

Fixed in Debian unstable. Too late for lucid though, it will be in maverick meerkat.

Changed in cowsay (Ubuntu):
status: In Progress → Fix Committed
assignee: nobody → François Marier (fmarier)
Changed in cowsay (Debian):
status: Confirmed → Fix Released
David Futcher (bobbo)
tags: added: patch-accepted-debian
removed: patch-needswork
Revision history for this message
Sebastien Bacher (seb128) wrote :

the new version is in ubuntu

Changed in cowsay (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.