Comment 31 for bug 206884

Revision history for this message
In , Henri Sivonen (hsivonen) wrote :

(In reply to André Pirard from comment #17)
> I think that the first thing for Character Encoding Autodetect to be less
> confusing is ti say what it does.
> Assuming that it means that any indication of a character set is ignored ans
> that it is guessed by the contents...

It means: If the type of the document is text/html or text/plain and there is no character encoding label on the HTTP layer or inside the document (in the text/html case) and there is no BOM at the start of the document, assume the language of the page is the one selected from the Auto-Detect menu and make a guess based on the contents of the file given that language assumption.

How would you make the menu "say" this?

Note: My current belief is that we don't actually need the Russian and Ukrainian autodetectors. Once the only autodetector we have is the Japanese one, we should probably not bother the user about its existence but couple it with choosing Japanese in Preferences: Content: Advanced: Fallback Character Encoding [or choosing "Default for Current Locale" in the Japanese localization]. Therefore, I think activity to get rid of the Russian and Ukrainian detectors (bug 845791) would be more productive than activity to polish the menu.

> Character Encoding Autodetect is normally not needed because a page MUST
> specify the encoding it uses.

Correct.

> Using it instead of reporting an error to a webmaster is causing the
> webmaster to continue to make the same errors.

Indeed.

> Also, picking the character code from the HTTP request is an error because
> the contents of the page MUST specify the encoding, it knows better than an
> Apache server

Indeed, Ruby's Postulate generally holds. Unfortunately, HTTP disagreed and it's too late to change that, because it would break pages that currently work due to Ruby's Postulate not being true for them.
http://www.intertwingly.net/slides/2004/devcon/69.html

And besides, all browser now agree on the precedence of HTTP over <meta>, so it's not worthwhile to break interoperability.

> and the browser won't update the page when it's written to a
> file.

Firefox is supposed to if you choose the "complete" option in Save As...

> The only case where character encoding mangling is necessary is when, for
> example, displaying a text file of which the character set is specified
> nowhere

Or when displaying an HTML file whose character encoding is specified nowhere. :-(