par2 cannot work with special characters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
par2cmdline (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: par2
Whether working with gpar2, pypar2 or the KDE variant (all of which rely on par2) any files with special characters - á, é, í, ó, ú, ç, ñ, etc. - are invisible to their corresponding parity files; par2 cannot see that they are present at all, much less read their contents. We have been looking for the source of the problem in this thread:
http://
And someone discovered and posted the following:
"Yep, the PAR 1.0 spec definitely uses UTF-16 (although it does not make that clear).
I've just had a quick look at the PAR 2.0 and I see that the main part of the spec refers to ASCII (which is actually meaningless). The correct current name for what used to be known as ASCII is US-ASCII (which is a 7-bit character set).
par2cmdline and QuickPar treat the filename as 8-bit but leave it up to the OS to decide what character encoding is being used. When QuickPar accesses a file, the filename is treated as ANSI using whatever code page is currently selected as the default.
In the optional part of the PAR 2.0 spec where unicode is referred to, it gives no indication at all as to what encoding is used. I would assume that UTF-16 is intended."
This is not a minor issue, as it affects an infinite number of real and possible files from Spanish, French, German, and any of the Western languages outside of the US and UK. The problem renders their parity files useless.
I thank you for any attention you can give to this problem.
Changed in par2cmdline (Ubuntu): | |
status: | New → Confirmed |
I don't entirely understand where the UTF-16 comment came from. The PAR2 spec says the following:
"?*4 ASCII char array Name of the file. This array is not guaranteed to be null terminated! Subdirectories are indicated by an HTML-style '/' (a.k.a. the UNIX slash). The filename must be unique."
(from http:// www.par2. net/par2spec. php#i__ 134603784_ 644 )
But in the case of a file I was looking at, the encoding was actually ISO-8859-1, not ASCII. My console locale was actually UTF-8. Either way, as indicated by this bug, par2cmdline does not actually convert the filename from one locale to the other (and one would wonder how par2 could know which character set to convert from, since the spec refers to ASCII).
That said, for me there was an easy workaround: simply use `convmv' to move the files par2 was unable to find from UTF-8 to ISO-8859-1; subsequently, par2 was able to find the files. After verifying them I simply reversed the convmv.