Writer rtf import wrong encoding

Bug #210990 reported by Anton
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenOffice
Confirmed
Unknown
libreoffice (Ubuntu)
Invalid
Undecided
Unassigned
openoffice.org (Ubuntu)
Won't Fix
Low
Unassigned

Bug Description

Binary package hint: openoffice.org

I believe that the encoding of footers text is done incorrectly.
I have a Russian document, that is supposed to write in the footer:

Стр. (page number)

The word 'Стр.' is supposed to be in Russian, but instead its garbage:
Ñòð.

The rest of the document is perfect (in proper Russian).
Thanks

Revision history for this message
Chris Cheney (ccheney) wrote :

Can you please include the document which exhibits the problem in the launchpad bug report? If it is confidential then reproducing the bug in a mostly empty document and attaching it would work as well.

Thanks,

Chris Cheney

Changed in openoffice.org:
status: New → Incomplete
Revision history for this message
Anton (anton-fit) wrote :

Please note, that if i open this document in MS Office, edit it a bit, save, and then open with OpenOffice, then it's OK.

Revision history for this message
Anton (anton-fit) wrote :

This is strange:

If i open my document(btw it's rtf, if that matters) with MS Office, delete all confidential data, save it, then the footer is readable by OpenOffice.

But, if i open my original document with OpenOffice, delete all confidential data, save it, then the footer is wrong not only in OpenOffice, but now also in MS Office. :)

Revision history for this message
Chris Cheney (ccheney) wrote :

Reproduced on upstream's 2.4.0.

Changed in openoffice.org:
importance: Undecided → Medium
status: Incomplete → Confirmed
Revision history for this message
Anton (anton-fit) wrote : Re: [upstream] [hardy] the encoding of footers text is done incorrectly

This file is genuine, unmodified by me. As you can see, the incorrect encoding is not only in the footer, but also in the main text.

Revision history for this message
Anton (anton-fit) wrote :

It should look like this
(That is the text, not the formating of it into two pages.)

Chris Cheney (ccheney)
Changed in openoffice.org:
status: Confirmed → Triaged
Revision history for this message
Chris Cheney (ccheney) wrote :

Are you sure this document is valid, or did you mean when saved it became invalid? As I can't even get this to render in Word 2007 properly.

Changed in openoffice:
status: New → Invalid
Changed in openoffice.org (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Anton (anton-fit) wrote :

//Are you sure this document is valid, or did you mean when saved it became invalid?
Yes, it's valid. The pdf was rendered from the doc file in windows using office(i switched on the 2 pages per sheet printing, it should look just like the right page)

//As I can't even get this to render in Word 2007 properly.
Do you have the Cyrillic fonts installed?

Revision history for this message
Chris Cheney (ccheney) wrote :

Well windows does show Cyrillic characters in the character map program so I am pretty sure I have fonts that should be able to display the characters in the file if they were formed correctly.

Can you open the file that is actually on launchpad in any program and have it display correctly? If so what program are you using to have it display correctly?

Thanks,

Chris Cheney

Revision history for this message
Chris Cheney (ccheney) wrote :

Note on the second document the first line is bold and is in Cyrillic but the lines after that look like messed up encoding. And that is even when opening it in Word 2007.

Revision history for this message
Anton (anton-fit) wrote :

I can definitely open it(the file that is actually on launchpad) with "microsoft word viewer 2003" with sp3 (it's free) installed under Linux, run with wine (you will also need Cyrillic fonts installed under Linux). The program shows it perfectly.

//And that is even when opening it in Word 2007.
I'll try to open it with some office version program in windows (as far as i remember it tried that before, and it worked) hopefully tomorrow.

//Note on the second document the first line is bold and is in Cyrillic but the lines after that look like //messed up encoding. And that is even when opening it in Word 2007
Maybe there is something like the "default encoding", which is set to Cyrillic, in case the default language in windows (microsoft word) is Russian?
Or the default encoding of the file?

Revision history for this message
Chris Cheney (ccheney) wrote :

I'm not sure how RTF works but it does look like what I have seen before in other programs when it was not in the right encoding mode, eg UTF-8 vs the old individual codepage way (don't remember the exact term). So perhaps these documents only show up right even under Windows when you are running Russian Windows. I'll see if I can attach a screenshot from Word 2007 showing what I mean with respect to the second file.

Revision history for this message
Chris Cheney (ccheney) wrote :
Revision history for this message
Anton (anton-fit) wrote :

As you mentioned before,
the first line is correct, and the rest is wrong(the number 24 is correct :) )

Try switching to the cp-1251 encoding(for the body), see if that helps display the document correctly.
(It would be strange that the document does not have the correct encoding inside it)
Maybe there is a slight difference (that wouldn't be a surprise concerning microsoft software) between how the English and Russian versions work?
How is it that the header rendered correct? Maybe the same codepage that was used to render the header is supposed to be used to render the body?

Have you tried the "microsoft word viewer"?

Revision history for this message
Anton (anton-fit) wrote :

my screenshoots

Revision history for this message
Anton (anton-fit) wrote :
Revision history for this message
Anton (anton-fit) wrote :
Chris Cheney (ccheney)
Changed in openoffice:
importance: Undecided → Unknown
status: Invalid → Unknown
Changed in openoffice:
status: Unknown → Confirmed
Revision history for this message
Anton (anton-fit) wrote : Re: [upstream] writer rtf import wrong encoding

I had a look at the rtf specification(http://www.artifax.net/productdownload/ArtRep/2.0/Help/rtf.htm) and here are my thought to what might be the problem. Hope this helps.

1)In the header of the document a font table is created with two fonts both Cyrillic charset(\fcharset204)
2)The document can only use fonts from the table.
3)The specification defines a command to set the default font number from the table.
4)The rtf file that is used( http://launchpadlibrarian.net/14655343/71627980-137-2008-05-20-17.rtf ) does not have that command. :(
I guess there has to be some convention on what to do in this case. ????
But, in any case the body text must be in Russian(because of point 1) and 2) ).
But i would assume that the last font in the table is the font that is set. I think so because the same command that sets the font (\fN) number N is used to set the font in the table.

Hope this helps resolve the problem quickly. :)

Cheers,
Anton

Chris Cheney (ccheney)
Changed in openoffice:
status: Confirmed → Unknown
Changed in openoffice.org:
status: Incomplete → Triaged
Changed in openoffice:
status: Unknown → Confirmed
Changed in openoffice:
status: Confirmed → Fix Released
Chris Cheney (ccheney)
Changed in openoffice:
status: Fix Released → Unknown
Changed in openoffice:
status: Unknown → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Anton, this issue seems resolved via LibreOffice Writer comparing it to Word 2003. Does this work for you?

lsb_release -rd
Description: Ubuntu 11.04
Release: 11.04

apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.3.2-1ubuntu5
  Candidate: 1:3.3.2-1ubuntu5
  Version table:
 *** 1:3.3.2-1ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-proposed/main i386 Packages
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

Changed in libreoffice (Ubuntu):
status: New → Incomplete
penalvch (penalvch)
Changed in openoffice.org (Ubuntu):
importance: Medium → Low
Changed in openoffice.org (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote : migrating packaging from OpenOffice.org to Libreoffice

[This is an automated message.]
There are no new official OpenOffice.org releases in Ubuntu packaging anymore => Won't Fix

If the problem persists, please mark this bug as "also affects project Libreoffice" or "also affects distribution Libreoffice (Ubuntu)" if that has not happened already.

Please leave references to upstream OpenOffice.org bugs in place to allow cross pollination.

penalvch (penalvch)
summary: - [upstream] writer rtf import wrong encoding
+ Writer rtf import wrong encoding
Revision history for this message
Bryan Quigley (bryanquigley) wrote :

Thank you for reporting this bug to Ubuntu. This Ubuntu release has reached EOL for Desktops.
See this document for currently supported Ubuntu releases: https://wiki.ubuntu.com/Releases

Since this bug hasn't been touched in a while and the Incomplete autoclose didn't work, I'm going to close it.

If you can still reproduce on a new version of Ubuntu, please reopen it.

Changed in libreoffice (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.