Starting BOM in HTML is incorrectly decoded

Bug #189229 reported by Christopher Yeleighton
4
Affects Status Importance Assigned to Milestone
OpenOffice
Confirmed
Unknown
libreoffice (Ubuntu)
Invalid
Undecided
Unassigned
openoffice.org (Ubuntu)
Won't Fix
Low
Unassigned

Bug Description

Binary package hint: openoffice.org

Steps to reproduce:

1. Save the following document as UTF-8 with BOM: (Note: there is a BOM between the DOCTYPE declaration!)

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"
><HTML ><HEAD ><TITLE >A document with BOM</TITLE ></HEAD
><BODY ><P ></BODY ></HTML >

2. Open the file with Open Office Writer.

Actual result: 

Expected result: an empty document

Revision history for this message
Chris Cheney (ccheney) wrote :

Can you give more detailed steps as to how to save the document in UTF-8 BOM? I don't see that as an option in OpenOffice...

Thanks,

Chris Cheney

Changed in openoffice.org:
assignee: nobody → ccheney
status: New → Incomplete
Revision history for this message
Christopher Yeleighton (giecrilj) wrote :

OR 1. GNOME edit the source and insert the BOM by hand.
OR 2. Use Windows Notepad.

Or do you mean OpenOffice is not designed to open HTML files it did not create? That would be really kewl…

Revision history for this message
Chris Cheney (ccheney) wrote :

I added a BOM (i think anyway) in the doctype and in web layout I didn't see any garbage. In HTML source view i saw the garbage you mentioned in the doctype. So I think I am unable to reproduce the problem you are talking about. I have attached the HTML file that I created so you can view it.

Thanks,

Chris Cheney

Revision history for this message
Chris Cheney (ccheney) wrote :
Revision history for this message
Christopher Yeleighton (giecrilj) wrote :

You have placed the BOM within the DOCTYPE; it should be at the very beginning, before the DOCTYPE.

Revision history for this message
Chris Cheney (ccheney) wrote :

Is it even legal html to have something in that location (before the doctype)? It might not be the best loose parser to show garbage if there is an illegal character at that location, but garbage in garbage out...

To get OpenOffice.org to even recognize UTF-8 text at all you have to add the meta tag telling it what character set it is in:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

Changed in openoffice.org:
assignee: ccheney → nobody
status: Incomplete → Invalid
Revision history for this message
Christopher Yeleighton (giecrilj) wrote :

The W3C validator says it is all right (white space is allowed before DOCTYPE). The character set, if it is declared, must be consistent with the BOM.
Both Firefox and Internet Explorer support this feature.
Any utility that aspires to the role of a word processor should examine the BOM of a text file first. Even Windows Notepad does that.

Changed in openoffice.org:
status: Invalid → New
Revision history for this message
Chris Cheney (ccheney) wrote :

This issue still exists in upstream's OpenOffice.org 2.4.0~rc2.

Changed in openoffice.org:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Christopher Yeleighton (giecrilj) wrote : Re: [Upstream] [hardy] Starting BOM in HTML is treated as text

Chris, the title as you entered it makes the bug invalid.
There is no harm in treating the BOM as text since it is a white space character (although HTML does recommend dropping it altogether).
The problem is it is incorrectly decoded and the resulting representation contains garbage.

Revision history for this message
Chris Cheney (ccheney) wrote : Re: [Upstream] [hardy] Starting BOM in HTML is incorrectly decoded

That is fine although the title was what you originally titled it. I just added [Upstream] [hardy] to the beginning of it.

Revision history for this message
Christopher Yeleighton (giecrilj) wrote :

Oops, what a shame :-(

Revision history for this message
Chris Cheney (ccheney) wrote :
Changed in openoffice:
importance: Undecided → Unknown
status: New → Unknown
Changed in openoffice:
status: Unknown → Confirmed
Chris Cheney (ccheney)
Changed in openoffice.org:
status: Confirmed → Triaged
Chris Cheney (ccheney)
tags: added: hardy
Revision history for this message
penalvch (penalvch) wrote :

Christopher Yeleighton, this issue is unreproducible in LibreOffice Writer via the Terminal:

cd ~/Desktop && wget https://bugs.launchpad.net/ubuntu/+source/openoffice.org/+bug/189229/+attachment/232136/+files/test.html && lowriter -nologo test.html

saw the expected 

Does this work for you? If using Lucid or Maverick feel free to perform at the Terminal:

sudo add-apt-repository ppa:libreoffice/ppa && sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y install libreoffice-writer

lsb_release -rd
Description: Ubuntu 11.04
Release: 11.04

apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.3.3-1ubuntu2
  Candidate: 1:3.3.3-1ubuntu2
  Version table:
 *** 1:3.3.3-1ubuntu2 0
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

Changed in libreoffice (Ubuntu):
status: New → Incomplete
Changed in openoffice.org (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote : migrating packaging from OpenOffice.org to Libreoffice

[This is an automated message.]
There are no new official OpenOffice.org releases in Ubuntu packaging anymore => Won't Fix

If the problem persists, please mark this bug as "also affects project Libreoffice" or "also affects distribution Libreoffice (Ubuntu)" if that has not happened already.

Please leave references to upstream OpenOffice.org bugs in place to allow cross pollination.

penalvch (penalvch)
summary: - [Upstream] [hardy] Starting BOM in HTML is incorrectly decoded
+ Starting BOM in HTML is incorrectly decoded
Revision history for this message
Bryan Quigley (bryanquigley) wrote :

Thank you for reporting this bug to Ubuntu. This Ubuntu release has reached EOL for Desktops.
See this document for currently supported Ubuntu releases: https://wiki.ubuntu.com/Releases

Since this bug hasn't been touched in a while and the Incomplete autoclose didn't work, I'm going to close it.

If you can still reproduce on a new version of Ubuntu, please reopen it.

Changed in libreoffice (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.