Italicized apostrophe incorrectly converted

Bug #1010936 reported by Michael Zeisler on 2012-06-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
sengian

Bug Description

Calibre Ver 0.8.50 running on Windows Vista

When converting a book from RTF to MOBI, an italicized apostrophe is changed to an italicized f.

For example:

RTF Text:
I said, "Flora, baby, I'll be there. I’ll be there.” (where the 2nd I'll be there is italicized)

converts to MOBI as:
  I said, "Flora, baby, I'll be there. Ifll be there." (where the 2nd Ifll be there is italicized)

Changing the component for this bug.

 assignee sengian
 tag rtf-input
 status triaged

Changed in calibre:
assignee: nobody → sengian (sengian)
status: New → Triaged
sengian (sengian) wrote :

Works for me, please attach your file as RTF can be created in lots of ways.

Changed in calibre:
status: Triaged → Incomplete

Process Steps:
1. Import attached file into Calibre by clicking "Add Books".
2. "Convert Books" creates MOBI containing error.

Calibre Ver 0.8.55 has worsened the error; MOBI now looks like:

I said, "Flora, baby, I'll be there. Ifll be there.h (where the 2nd Ifll be
there is italicized & the final " is replaced with an un-italicized h)

Mike

On Sat, Jun 9, 2012 at 4:53 PM, sengian <email address hidden> wrote:

> Works for me, please attach your file as RTF can be created in lots of
> ways.
>
> ** Changed in: calibre
> Status: Triaged => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1010936
>
> Title:
> Italicized apostrophe incorrectly converted
>
> Status in calibre: e-book management:
> Incomplete
>
> Bug description:
> Calibre Ver 0.8.50 running on Windows Vista
>
> When converting a book from RTF to MOBI, an italicized apostrophe is
> changed to an italicized f.
>
> For example:
>
> RTF Text:
> I said, "Flora, baby, I'll be there. I’ll be there.” (where the 2nd I'll
> be there is italicized)
>
> converts to MOBI as:
> I said, "Flora, baby, I'll be there. Ifll be there." (where the 2nd
> Ifll be there is italicized)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1010936/+subscriptions
>

sengian (sengian) wrote :

Using your file, I see it. I will investigate.

Changed in calibre:
status: Incomplete → Confirmed
Colin Benner (yzhs) wrote :

The bug still exists in the latest master commit on Linux and, presumably, other platforms.

The problem is that rtf2xml (both the version distributed with Calibre and the latest upstream version) does not handle multi-byte encodings correctly. In particular, the problematic single and double quote characters in Michael's file are encoded using code page 932/Shift JIS as two-byte characters: 0x81 0x66 and 0x81 0x68. In the single-byte encoding used by rtf2xml, 0x81 is not assigned, so rtf2xml drops that byte and 0x66 and 0x68 are interpreted as ASCII letters f and h, respectively.

In case anyone else wants to convert a file exhibiting this problem:
As a workaround, I successfully converted the RTF file after opening and saving it in LibreOffice, which results in UTF-8 encoded quote characters, which Calibre handles properly. Another way to get such a UTF-8 encoded RTF document is using "unrtf --rtf original.rtf > converted.rtf". (You might have to use a UTF-8 locale for this.)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers