column has problems with different file encodings

Bug #1331521 reported by Jürgen Kahnert
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
bsdmainutils (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Ubuntu 10.04: bsdmainutils 8.0.1ubuntu1
Ubuntu 12.04: bsdmainutils 8.2.3ubuntu1
Ubuntu 14.04: bsdmainutils 9.0.5ubuntu1

column won't produce any output if the file encoding isn't recognized. With Ubuntu 8.04 the output of unknown characters were crippled, but there was still an output. This changed since fgetws is used to read the characters (instead of fgets).

With Ubuntu 14.04 there is at least an error message instead of an empty output:

    [ubu1404] # column -t -s ';' bar
    column: Invalid or incomplete multibyte or wide character

    [ubu1204] # column -t -s ';' bar
    [ubu1204] # echo $?
    0

    [ubu1204] # cat bar
    1;ä
    2;ö
    3;ü

    [ubu1404] # file foo bar
    foo: UTF-8 Unicode text, with CRLF line terminators
    bar: ISO-8859 text, with CRLF line terminators

Even with the correct locale setting it won't change anything:

    [ubu1404] # LC_CTYPE=de_DE.ISO-8859-1 column -t -s ';' bar
    column: Invalid or incomplete multibyte or wide character

With foo everything works as expected:

    [ubu1404] # cat foo
    1;ä
    2;ö
    3;ü

    [ubu1404] # column -t -s ';' foo
    1 ä
    2 ö
    3 ü

I guess this is related to bug #1065329 - but if I see that correctly, it's not limited to Ubuntu, it's a general fgetws problem.

Revision history for this message
Jürgen Kahnert (juergen-kahnert) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in bsdmainutils (Ubuntu):
status: New → Confirmed
Revision history for this message
Seb Bonnard (sebma) wrote :

Hi,

My problem seems to be related to yours, here what I get with a non accentuated file containing :

$ file toto1
toto1: ASCII text
$ cat toto1
./DVD/TEST.vob
./Enregistrements/France 2 - Mediterranee, notre mer a tous - 28-01-2014 20h53 01h42 22.m2ts
./CHRIST/JesusLeFilm/The_Gospel_of_Luke_-_Film_-_Visual_Bible_in_HD_Very_Rare_Version__22__2PHPLApTt7Y.mp4
./CHRIST/tv2vie.org/videos/direct2vie/20140116 --- Veillee pour la Jeunesse 20_12_13 --- HtID3NZhFhQ.webm
$ column -t toto1
./DVD/TEST.vob
./Enregistrements/France 2 - Mediterranee, notre mer a tous - 28-01-2014 20h53 01h42 22.m2ts
./CHRIST/JesusLeFilm/The_Gospel_of_Luke_-_Film_-_Visual_Bible_in_HD_Very_Rare_Version__22__2PHPLApTt7Y.mp4
./CHRIST/tv2vie.org/videos/direct2vie/20140116 --- Veillee pour la Jeunesse 20_12_13 --- HtID3NZhFhQ.webm

Revision history for this message
Seb Bonnard (sebma) wrote :

BTW : I'm using Ubuntu 14.10 wih bsdmainutils version 9.0.5ubuntu1.

Revision history for this message
Seb Bonnard (sebma) wrote :

Here is a better example with toto1 containing two columns :

$ file toto1
toto1: ASCII text
$ cat toto1
3889M ./DVD/TEST.vob
3139M ./Enregistrements/France 2 - Mediterranee, notre mer a tous - 28-01-2014 20h53 01h42 22.m2ts
2970M ./CHRIST/JesusLeFilm/The_Gospel_of_Luke_-_Film_-_Visual_Bible_in_HD_Very_Rare_Version__22__2PHPLApTt7Y.mp4
1944M ./CHRIST/tv2vie.org/videos/direct2vie/20140116 --- Veillee pour la Jeunesse 20_12_13 --- HtID3NZhFhQ.webm
$ column -t toto1
3889M ./DVD/TEST.vob
3139M ./Enregistrements/France 2 - Mediterranee, notre mer a tous - 28-01-2014 20h53 01h42 22.m2ts
2970M ./CHRIST/JesusLeFilm/The_Gospel_of_Luke_-_Film_-_Visual_Bible_in_HD_Very_Rare_Version__22__2PHPLApTt7Y.mp4
1944M ./CHRIST/tv2vie.org/videos/direct2vie/20140116 --- Veillee pour la Jeunesse 20_12_13 --- HtID3NZhFhQ.webm

Revision history for this message
Seb Bonnard (sebma) wrote :

The file used for my second example.

Revision history for this message
Seb Bonnard (sebma) wrote :

Here is the toto1 file is used in my second example, but It seems that multiple spaces output by "column -t" are truncated by the Launchpad application.

So, in order to reproduce my problem, you need to type these commands in Ubuntu 14.10 :
$ unset LANG LANGUAGE $(echo ${!LC_*})
$ file toto1 #Just to confirm my file in ASCII
$ cat toto1
$ column -t toto1

and see for yourself :-)

Revision history for this message
Seb Bonnard (sebma) wrote :

Hi,

I have just noticed that the behavior of the "column" command is normal is my use case therefore my example is completely off topic, oops :-)

Can the admin .of this forum remove my comments from this bug (starting from comment #3) ?

Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.