Comment 23 for bug 228988

Revision history for this message
André Pirard (a.pirard) wrote : Re: [Bug 228988] Re: Firefox can display a page with the wrong encoding

On 2008-08-27 01:42, ThiloPfennig wrote :
> Although Andre was spamming ...
Spamming ?????
If you say that spamming is to have to repeat the same words over and
over to have simple facts understood or believed, you're right, and
you've started doing the same!

What I said is :

1) That it is nonsense to specify the encoding charset in HTTP (or
rather, get it from HTTP), because if you save the HTML file (or
transmit it by FTP or network sharing or ...), you lose that HTTP
information unless you modify the file when you save it, which had
better be done from the start.
Having the transport protocol specify the attributes of a file is great
but only if all transport protocols can do it and if the receiving
program knows where to keep that information.
That is far from being the case and hence the best way and place to
identify file attributes is within the file itself.
Obvious, isn't it?
If anyone disagrees, look at www.phpwact.org/php/i18n/charsets under
"Everybody gets it wrong" for a case of a joker HTTP saying that a file
specifying utf-8 is iso8859-1 and of a second joker believing the first
and obstinately displaying the joke charset.
Obviously, it's nonsense to have the sender look into a file to tell the
receiver what it can find by himself, especially if the sender's job is
only to transmit and if the receiver's job is to decode.
Is it a good idea for Firefox to believe someone (HTTP) saying what is
obviously wrong?

2) The default character set is ISO8859-1 and hence it is not a good
idea to allow the user to let configure it and hence to have Firefox
disregard the standards.
If there are encoding errors in pages, the thing to do is to correct
those pages and not to tweak the browser to display "correctly"
incorrect pages and "incorrectly" the correct ones.
Autodetecting what someone should have specified but did not specify is
always bad because it causes that someone to believe what he did was
right and to continue the same spreading mistake.

3) I have seen pages that do not specify the character set display other
character sets than ISO8859-1, generally UTF-8 but randomly in time and
character set.
So, I was glad to find a procedure to set Firefox in a situation when
the error occurred repeatedly.
I have been accused to have _caused_ the error.
To the best of my knowledge, nothing you can do before a page displays,
and certainly not what I did, is allowed to change the character set
that that page is displayed with.
Only _after_ a page was displayed would someone work around an encoding
mistake by _forcing_ View|Character encoding to redisplay the page with
another encoding.
What I did, forcing an encoding, is supposed do to change the display of
the last page, not the next one, isn't it? I wonder what so hard to
understand in that.
I can understand that some people would take it as a cheap game to guess
the character set of the next page they find, but even then there's no
need to preset the character set to play that game. And I swear I
neither like nor played that game.
May I recall that, just like it happened to ThiloPfennig and Alexander
Sack himself, the bug basically shows without doing anything special.
Doing something special beside does not invalidate the report, that's
plain logic.
> To start with the first example
> mentioned in the bug report if I access the page:
> http://atilf.atilf.fr/tlf.htm Firefox sets encoding to UTF-8.
>
Same as I.
But Alexander Sack wins.
On 2008-05-12 he got the weirdest charset : windows 1252.
Problem is that he says that it's not a bug because 1252 and 8859-1
differ in only 32 seldom used characters. I wish he had been hit by EBCDIC.

On 2008-09-05 23:48, ThiloPfennig wrote :
> I have another example where things do not work well:
> http://www.kodak.com/eknec/PageQuerier.jhtml?pq-path=12368&pq-locale=de_DE&CID=KOSBANNER&LOC=290808_CRM_F1_M863
>
> It contains this tag:
> <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
>
> But display is not set to UTF-8 but to ISO-8859-1 which is the opposit
> of what described above.
I win on this one : I got US-ASCII.
As I really want my prize I attached the proof.

And if you save that page to a file, the tag you speak of
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
is rewritten as follows
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
But if you rewrite the line back to utf-8 it still displays
incorrectly. I quit.

Aren't foxes witty and obstinate?

Fifteen years ago, I was claiming very loud (in the 822 and 821 ietf
(Internet engineering task forces for e-mail) that ISO 10646 (UTF-8)
should be the sole code to be used, that doing so would make identifying
the code unnecessary, and that doing otherwise (allowing a host of 8-bit
character sets) would lead to chaos.

I think that I was right and that we're almost there.

I've seen Firefox 3.0.1 announced.
My "about" says it's 3.0 and I'm unsure if I'm still beta testing or not.
I am up to date with Ubuntu.
(Except for wine because automatic updates can break your system)
Could it be possible to be up-to-date with FF and to know it?