wrong charcter set in rtf to mobi conversation

Bug #807491 reported by Alex Samorukov
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
calibre
Won't Fix
Undecided
sengian

Bug Description

I have rtf document which is not correctly converted from RTF to MOBI format, seems that charset is wrong and result is unreadable. This document was created by OpenOffice from the Ms-word format. Its opening correctly in the OpenOffice and Wordpad, so i believe that bug is in Calibre. May be /lang is not parsed correctly? I see that it using lang1049 for the text, what means "russian", so input encoding needs to be cp1251. I`m attaching this document to the ticket. Tell me if you need anything else.

Revision history for this message
Alex Samorukov (samm-os2) wrote :
Revision history for this message
Alex Samorukov (samm-os2) wrote :

One more comment - adding ansicpg1251 to the rtf header fix detection

Revision history for this message
Alex Samorukov (samm-os2) wrote :

Ok, so the problem is that codepage in this document is specified in the \fcharsetN header in the font description. According to spec (http://latex2rtf.sourceforge.net/rtfspec_6.html) 204 is Russian (eq. 1251). This is respected by OpenOffice and Wordpad and probably ignored by the calibre.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 807491

Changing the component for this bug.

 assignee sengian
 tag rtf-input
 status triaged

Changed in calibre:
assignee: nobody → sengian (sengian)
status: New → Triaged
Revision history for this message
Kovid Goyal (kovid) wrote :

Note that if you are using OpenOffice, save your documents in odt and convert that in calibre.

Revision history for this message
sengian (sengian) wrote :

I have looked into the linked bug with diacritics, and I am still not sure about it, but concerning this one you are absolutely correct.
The problem is with the tool used by calibre which does not support \fcharsetN. It relies on the \ansicpgN to get the codepage.

Changed in calibre:
status: Triaged → Invalid
Revision history for this message
Alex Samorukov (samm-os2) wrote :

Why it is set to invalid? It is 100% bug. This document was produced by OpenOffice - one of the most popular office suites. It is opened correctly both in OO and MS wordpad, so document itself is correct.

My recommendation is to look on ansicpg and if it is not found - try to "guess" charset from the fcharset in the font list.

And thank you for recommendation to use odt, i`ll try this.

Revision history for this message
sengian (sengian) wrote :

I agree, I will put it to won't wix if you want as it is on the TODO list but currently a limitation of the RTF conversion tool. The problem is with the guess you propose is that if you look, you will see that there is a lot of fonts being declared.
FYI, the ansicpg is declared but is superseeded by the \fcharset, so I need to implement this in the tool and in the conversion.

Changed in calibre:
status: Invalid → In Progress
Revision history for this message
Alex Samorukov (samm-os2) wrote :

Yes, i understand that this is not real solution, because its possible to define more than one \fcharset for different font (anyone using this?), so probably implementation needs to be rewritten to use encoding per object and not only per document. So thank you for reply and explanation. Unfortunately i am not a python developer, so can`t help with that.

P.S. conversion from ODT works fine on this document.

Revision history for this message
Kovid Goyal (kovid) wrote :

@sengian: Are you actually planning to implement support for fcharset in rtf2xml? If not, feel free to close this ticket as wontfix. Personally, I have no interest in supporting fcharset in RTF.

Revision history for this message
sengian (sengian) wrote :

I have, but as I have no idea of how long it will take, so I will close it for know.

Changed in calibre:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.