Comment 0 for bug 228988

Revision history for this message
André Pirard (a.pirard) wrote : Firefox can display a page with the wrong encoding

Since I'm using it, I see Firefox (2.X and 3.0bX) sometimes display pages using the wrong character encoding. Sometime a page will display correctly, later the same page won't, without any apparent reason why there's a difference.

I have first introduced Bug #206884 but, although it says exactly this, it turns out that several people misunderstood it, that it became a mess and that adding more explanation to it would make the mess even worse.
So I'm trying to make it clear and orderly in here, please read more detail in #206884.
And, obviously, make #206884 a duplicate of this one, not the opposite.

I made tests with http://atilf.atilf.fr/tlf.htm which often displays to me as shown in the attachment.
I even found a way to produce a problem.

This was Problem #1 and here is the procedure to produce it.

1) clear the said URL from history [any occurrence of the hostname in address bar dropdown]
2) open a new window or tag and do the following in it
3) set View|Character Encoding to UTF-8 (or anything but ISO889-1)
4) type that URL in the address bar and display page
5) the page displays using the wrong encoding (as the attachment, using UTF-8)

Repeat 1-5, setting ISO889-1 instead, and you get a correct display.
See an alternate procedure and more comments at the end of this text.

The obvious problem is that something done before the page displays produces an incorrect character set usage. In this case, we did it willingly but it may be the same problem that occurs unwillingly.
Obviously, all pages should contain all the information necessary to display correctly and ONLY AFTER a page was displayed should the user be able to try to display it using other encodings.

The header of that page is :
<HEAD>
<TITLE>
Le Trésor de la Langue Française Informatisé
</TITLE>
<link rel="stylesheet" type="text/css" href="atilf.css">
</HEAD>

We notice that this header doesn't specify the encoding.
As far as I know, the default encoding is ISO8858-1.
Hence, displaying the page in UTF-8 is always incorrect, unless the user changes the encoding after it was displayed.

Additional problem #2

I'll sketch it here. Please also read the bulk in #206884.

I was trying to find what could make determining the page encoding uncertain and I found this.

1) Content OptionsPreferences|Fonts & Colors|Character Encoding
The character encoding selected here will be used to display pages that do not specify which encoding to use.

This preference is unneeded, as, in my mind, the default is always ISO8859-1.
Allowing the user to change the default sure is a way to run into problems.
The Web will show you people telling others to set it back to ISO8859-1.
Yet ...

2) I tried to set this default setting to UTF-8 and, to my amazement, the test URL displayed correctly.

3) It may be that Firefox remembers the pages encodings in history (why I'm asking to clear it).
If that's a good feature once the user manually corrects the encoding for a page, it's less appropriate when a user who doesn't know how to change this encoding is hit by this occasional problem as the page encoding will be wrong for history's life instead of occasionally.

4) My discussion about how the standards badly define the encoding and the default encoding is for the Firefox developpers to notify those who write the standards.

PS:
Having spent hours trying to explain this simple problem and making tests, I just found another way to demonstrate it :

0) Use a freshly started, otherwise empty, Firefox
1) Again, clear the said URL from history [any occurrence of the hostname in address bar dropdown] as if it were the first time you use that URL
2) Use Google to search "Trésor Langue Française Informatisé", check that you get our URL
3) Right-click on link to "Open ... in new Tab"
4) the page displays using the wrong encoding (as the attachment, using UTF-8)

This is more like a user would do.
I thought that the UTF-8 setting in a freshly opened page or tab was because Google uses UTF-8, but I finally see that any empty tab ow window I open has UTF-8 set.
But, again, the problem is not what is set but that Firefox uses it as default instead of ISO8859-1.

Any page should display correctly whatever is done before it starts displaying.