par2 cannot work with special characters

Bug #433260 reported by ricardisimo
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
par2cmdline (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: par2

Whether working with gpar2, pypar2 or the KDE variant (all of which rely on par2) any files with special characters - á, é, í, ó, ú, ç, ñ, etc. - are invisible to their corresponding parity files; par2 cannot see that they are present at all, much less read their contents. We have been looking for the source of the problem in this thread:

http://ubuntuforums.org/showthread.php?p=7975747

And someone discovered and posted the following:

"Yep, the PAR 1.0 spec definitely uses UTF-16 (although it does not make that clear).

I've just had a quick look at the PAR 2.0 and I see that the main part of the spec refers to ASCII (which is actually meaningless). The correct current name for what used to be known as ASCII is US-ASCII (which is a 7-bit character set).

par2cmdline and QuickPar treat the filename as 8-bit but leave it up to the OS to decide what character encoding is being used. When QuickPar accesses a file, the filename is treated as ANSI using whatever code page is currently selected as the default.

In the optional part of the PAR 2.0 spec where unicode is referred to, it gives no indication at all as to what encoding is used. I would assume that UTF-16 is intended."

This is not a minor issue, as it affects an infinite number of real and possible files from Spanish, French, German, and any of the Western languages outside of the US and UK. The problem renders their parity files useless.

I thank you for any attention you can give to this problem.

Revision history for this message
bastiaan (bastiaan-bjacques) wrote :

I don't entirely understand where the UTF-16 comment came from. The PAR2 spec says the following:

"?*4 ASCII char array Name of the file. This array is not guaranteed to be null terminated! Subdirectories are indicated by an HTML-style '/' (a.k.a. the UNIX slash). The filename must be unique."

(from http://www.par2.net/par2spec.php#i__134603784_644 )

But in the case of a file I was looking at, the encoding was actually ISO-8859-1, not ASCII. My console locale was actually UTF-8. Either way, as indicated by this bug, par2cmdline does not actually convert the filename from one locale to the other (and one would wonder how par2 could know which character set to convert from, since the spec refers to ASCII).

That said, for me there was an easy workaround: simply use `convmv' to move the files par2 was unable to find from UTF-8 to ISO-8859-1; subsequently, par2 was able to find the files. After verifying them I simply reversed the convmv.

Revision history for this message
ricardisimo (ricardisimo) wrote :

Well, that's already leagues ahead of where I've been the past few months. I'm going to try out convmv on some classical music titles or some such, and report back to see if a) the par2 can check integrity; and b) if the parity files can reconstruct missing files with missing blocks... for once I actually hope that I do have broken files that need fixing.

The main point remains: how is it possible to generate pars for "Sénégal.mp3" or "Chançon de Roland.pdf", but then have those same parity files not even recognize their parents? It's weird, isn't it? Or is it just me?

hackel (hackel)
Changed in par2cmdline (Ubuntu):
status: New → Confirmed
Revision history for this message
Frodon (frodon) wrote :

Just a "i got this problem too" comment to bring my support to this bug report. I had no other choice than using a XP VM to solve this at the time.

Hope something can be done to fix this character encoding issue.

Revision history for this message
ricardisimo (ricardisimo) wrote :

There is a workaround, but not a fix. It would be nice if this could be fixed once and for all.

http://ubuntuforums.org/showpost.php?p=8083303&postcount=17

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.