cdda2wav - cannot cope with accents in CDDB data

Bug #6493 reported by Jo Shields
2
Affects Status Importance Assigned to Milestone
cdrtools (Ubuntu)
Invalid
Wishlist
Michael Vogt

Bug Description

When querying CDDB data (using the -L command line flag), cdda2wav chokes on complex characters -

T04: 35181 2:05.60 audio linear copydenied stereo title 'Au fond d\'un r\uffffve dor\uffff' from ''

is created from

TTITLE3=Au fond d'un rêve doré

for the CDDB entry http://www.freedb.org/freedb/rock/35069e05

This is 100% reproducable, on tracks with accents in their titles. It makes cdda2wav's CDDB retireval useless for many applications.

Revision history for this message
Jo Shields (directhex) wrote :

It seems this is an issue with the CDDB code not keeping up with the times - FreeDB Protocol 6 stipulates UTF-8 support. Proto 5 did not - so cdda2wav (which uses 5) is only getting the first byte of multi-byte characters.

cdda2wav should be updated to use Proto 6 (thereby adding UTF-8 support).

Revision history for this message
Matt Zimmerman (mdz) wrote :

Looks like one for upstream, please forward

Changed in cdrtools:
assignee: nobody → mvo
Revision history for this message
Michael Vogt (mvo) wrote :

I send a mail to the upstream author about this problem. I wasn't able to find a upstream bugtracker to file the problem.

Cheers,
 Michael

Revision history for this message
Jo Shields (directhex) wrote :

I spoke with the FreeDB guys about this.

FreeDB's data is all stored as UTF-8 strings. The FreeDB v6 protocol deals fine with this. v5 is essentially the same, but only supports single-byte characters.

cdda2wav identifies itself as using the v5 protocol as an absolute maximum (somewhere around line 1317 of cdda2wav/toc.c), so is only receiving the first byte of multi-byte UTF-8 characters from the server.

The obvious (and tricky) solution is to improve cdda2wav to use the v6 protocol and therefore UTF-8.

Simon Law (sfllaw)
Changed in cdrtools:
status: Unconfirmed → Confirmed
Revision history for this message
Schily (schilling-fokus) wrote :

This is definitely not a cdda2wav bug, it is rather a FreeDB bug.

The original FreeDB database was ISO-8859-1, the CD-Text standard
requires ISO-8859-1 and for this reason, cdda2wav requests the text
in the old ISO-8859-1 coding.

If FreeDB is unable to send a correct ISO-8859-1 representation
of the text, this is a deviation from previous behavior.

BTW: I did never get a mail on this from the ubuntu team, is there
a collaboration problem?

Revision history for this message
Schily (schilling-fokus) wrote :

After investigating on this bug report, it turns out that there is no bug.

CD-Text is _always_ ISO-8859-1, this is why cdda2wav always retrieves the data in ISO-8859-1

cdda2wav correctly retrieves the information for this CD from FreeDB.
cdda2wav prints the text directly from FreeDB in ISO-8859-1 without recoding.
cdrecord -text -useinfo will correctly write CD-Text data from the retrieved information.

Your problem is that you use a UTF-8 terminal and not a ISO-8859-1 terminal.

CD-Text is unable to deal with UTF-8 by design, it predates Unicode.

Could you explain why you believe that there is a problem?

From my understanding, the Subject should be: user - cannot cope with accents in CDDB data
as you see things different because you use an TF-8 terminal.

Revision history for this message
Schily (schilling-fokus) wrote :

Set to "invalid" because the problem is only caused by the representation of the verbose output from cdda2wav in a UTF-8 terminal

Changed in cdrtools:
status: Confirmed → Invalid
Revision history for this message
Jo Shields (directhex) wrote :

Unsubscribed. I'm sick to the back teeth of reporting bug after bug in j0000rgware, to which the reply is "it's your fault for not being me, retard"

Revision history for this message
hansalfredche (hansalfredche) wrote :

While per se there might be no bug, I'd like to relaunch this issue. Since some time cdda2wav stopped working with freedb. I don't know why it happened. Anyway it would be nice to have UTF-8, there are so many advantages. First of all this would be a step away from (west-)europe centrism. Indeed, already thought about how many languages DON'T use ISO-8859-1? For now, japanese, chinese, greek, east-european etc. users are simply left out because cdda2wav is simply not capable of representing those characters. But well, I'm speaking against a wall and unluckily the icedax people are even worse (they didn't even make it to the ISO-8859 era ... ).

Revision history for this message
Schily (schilling-fokus) wrote :

First some notes:

cdda2wav did not stop working with freedb. In contrary: it still works even with
exactly the CD mentioned in the "Bugreport".

UTF-8 does not help you as UTF-8 is not an allowed coding for CD-Text.

It is a matter of fact that Philips only supports ISO-8859-1 and a japaneve + a
korean non-iso coding.

I have no Idea what you expect and as long as you are not able to express
your expectations, I cannot help as cdda2wav does not have a bug.

You are however right with icedax: It has been created only for "political"
reasons (by some people who do not support OSS) and for this reason, there
is no development in this fork.

The original software is still under development and I am open to real bug reports
and to proposals for enhancement, but from the information I currently have, there
is neither a bug nor a way to enhance cdda2wav in a way that would help people.
If you see a way to improve cdda2wav, you would need to explain your ideas.

Revision history for this message
hansalfredche (hansalfredche) wrote :

First of all, thank you for responding.

The bug I mentioned does not seem to be due to cdda2wav directly, but to something in my network. But as only cdda2wav (and Icedax, I just tried) was affected and not other programs (in various versions on various OS's) it looked like a cdda2wav bug ...

Anyway, what I wanted to say is, as already was proposed by someone else, it would be nice to consider support for FreeDB protocol 6. From your description I can see however this would involve some important changes. As you say, cdda2wav outputs as ISO-8859-1. If FreeDB protocol 6 would be added this would mean changing the output encoding to unicode (it would not make sense to read information as unicode from FreeDB for loosing the information afterwards). But this in turn would mean quite some charset recoding (especially for CD text). Also, as I understand you, this would also mean serious problems for cdrecord as the input files could be either ISO-8859-1 or unicode (well, the use of unicode should probably be specified in the files, but this would still represent some important changes).

The main gain would by the way not be for writting CD text which has limited charset support, but when encoding to other formats using cdda2wav information for writting the tags (flac, mp4, mp3, ... ).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.