fuzzy/confusing firefox View -> Character encoding menu semantics

Bug #206884 reported by André Pirard
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Low
firefox (Ubuntu)
Triaged
Low
Unassigned

Bug Description

Please note that I am not the reporter of this bug any longer.
Alexander Sack is. He changed the title to his own understanding.
I personnally understand View -> Character encoding perfectly.
What I say is that FF does not always display ISO8859-1 by default.

André.

I have seen this since long with both Firefox 2.x and 3.0b3.

Rem: Please note that I don't say that Firefox always uses the wrong encoding.
Please read my followup to see how to reproduce the problem.

I display, for example, http://atilf.atilf.fr/tlf.htm
Its header is

<HEAD>
<TITLE>
Le Trésor de la Langue Française Informatisé
</TITLE>
<link rel="stylesheet" type="text/css" href="atilf.css">
</HEAD>

Hence, its encoding should be ISO8859-1 by default as it has always been.

As the uploaded attachment shows Firefox displays it using UTF-8.

In Edit|Preferences|Content|Font & Colors|Default font|Advanced|Character Encoding
there's an option named "Default character encoding" documented as follows
      The character encoding selected here will be used to display pages that
      do not specify which encoding to use.
What's the use of this setting if the default must ALWAYS be ISO8859-1?
Otherwise said, what would be the definition of a changing default?
It can only cause people to _produce_ the error I describe.
Hence, produce confusion.
I saw people say that the wrong behavior I describe is caused by a wrong setting.
There should obviously be no user setting for a necessary default.
How could the heck a user know what default to set in his browser before being able to read a page if the only place it can be said is in that page he could only read by setting the correct default ;-)

And this option was left to ISO8859-1 in my browser, of course.

Search www.w3.org/TR/html401/charset.html for "default" and you will learn that a HTML document character code that should obviously be specified within the document is designed to be specified in the HTTP header (without saying BTW how it is specified when FTP is used) with ISO8859-1 as the default.
Note that this blunder attributed to HTTP servers accused of not being able to detect the character code of files they store or of being misconfigured has been circumvented by introducing a META directive able to provide -- from the HTML document itself -- HTTP header data and hence the character code.
But note that this is done without concluding that ISO8859-1 is the default code of META too, and hence of the document, without regard to the following question.
Question : how the heck could a HTML "user agent" that ignores the default character set work any better than my posting this if you and I didn't know that we have to use ASCII?
Answer : no better than the page display I show in my attachment.
And finally, note that if the reliability of the expected result of a standard lies in this phrase :
"By combining these mechanisms, an author can greatly improve the chances that, when the user retrieves a resource, the user agent will recognize the character encoding."
the conclusion is : "OK, OK, that was only my bad luck again, it's a random game, bug dismissed, Firefox within said specs, I have to try again".

Or should we try to see why Firefox didn't display ISO8859-1? I've see browsers do that for years.

Revision history for this message
In , Wlevine (wlevine) wrote :

i agree. view > character coding > auto-detect > off = checked on my computer,
but phoenix still seems to use whatever coding is specified by the web page. i
also think the entire character coding submenu is very confusing.

Revision history for this message
In , Seb-delahaye (seb-delahaye) wrote :

Confirming for developer review.

Revision history for this message
In , Nathans-desi (nathans-desi) wrote :

It seems like the default should be "Universal" instead of "(Off)". The issue
was brought up in the following thread:

http://www.mozillazine.org/forums/viewtopic.php?t=7554

Revision history for this message
In , Prognathous (prognathous) wrote :

> Also the other options "more" and "customize" make little sense to me

See Bug 52157

Prog.

Revision history for this message
In , Bugzilla-babylonsounds (bugzilla-babylonsounds) wrote :

Taking QA Contact as designated owner of Firebird-Menus. Sorry for bugspam.

Revision history for this message
In , Stmoebius (stmoebius) wrote :

Firefox has the ambition to be streamlined, simple, and polished.
Nominating this for 1.0

Revision history for this message
In , Bugs-bengoodger (bugs-bengoodger) wrote :

-ing.

Revision history for this message
In , Prognathous (prognathous) wrote :

Hardware -> All

Prog.

Revision history for this message
In , Prognathous (prognathous) wrote :

Also see Bug 301190 - "Better documentation for Character Encoding -> Auto-Detect"

Prog.

Revision history for this message
In , Hagy Hag (elektroschock) wrote :
Revision history for this message
In , Philringnalda (philringnalda) wrote :

*** Bug 332681 has been marked as a duplicate of this bug. ***

Revision history for this message
André Pirard (a.pirard) wrote : Firefox uses the wrong display encoding

I have seen this since long with both Firefox 2.x and 3.0b3.

I display, for example, http://atilf.atilf.fr/tlf.htm
Its header is

<HEAD>
<TITLE>
Le Trésor de la Langue Française Informatisé
</TITLE>
<link rel="stylesheet" type="text/css" href="atilf.css">
</HEAD>

Hence, its encoding should be ISO8859-1 by default as it has always been.

As the uploaded attachment shows Firefox displays it using UTF-8.

In Edit|Preferences|Content|Font & Colors|Default font|Advanced|Character Encoding
there's an option named "Default character encoding" documented as follows
      The character encoding selected here will be used to display pages that
      do not specify which encoding to use.
What's the use of this setting if the default must ALWAYS be ISO8859-1?
Otherwise said, what would be the definition of a changing default?
It can only cause people to _produce_ the error I describe.
Hence, produce confusion.
I saw people say that the wrong behavior I describe is caused by a wrong setting.
There should obviously be no user setting for a necessary default.
How could the heck a user know what default to set in his browser before being able to read a page if the only place it can be said is in that page he could only read by setting the correct default ;-)

And this option was left to ISO8859-1 in my browser, of course.

Search www.w3.org/TR/html401/charset.html for "default" and you will learn that a HTML document character code that should obviously be specified within the document is designed to be specified in the HTTP header (without saying BTW how it is specified when FTP is used) with ISO8859-1 as the default.
Note that this blunder attributed to HTTP servers accused of not being able to detect the character code of files they store or of being misconfigured has been circumvented by introducing a META directive able to provide -- from the HTML document itself -- HTTP header data and hence the character code.
But note that this is done without concluding that ISO8859-1 is the default code of META too, and hence of the document, without regard to the following question.
Question : how the heck could a HTML "user agent" that ignores the default character set work any better than my posting this if you and I didn't know that we have to use ASCII?
Answer : no better than the page display I show in my attachment.
And finally, note that if the reliability of the expected result of a standard lies in this phrase :
"By combining these mechanisms, an author can greatly improve the chances that, when the user retrieves a resource, the user agent will recognize the character encoding."
the conclusion is : "OK, OK, that was only my bad luck again, it's a random game, bug dismissed, Firefox within said specs, I have to try again".

Or should we try to see why Firefox didn't display ISO8859-1? I've see browsers do that for years.

Revision history for this message
André Pirard (a.pirard) wrote :
André Pirard (a.pirard)
Changed in firefox:
status: New → Confirmed
André Pirard (a.pirard)
description: updated
Revision history for this message
André Pirard (a.pirard) wrote :

I've been watching what I was doing before the problem occurs.
It was always when displaying that URL in a freshly opened window.
I'm not sure about what in the previous environment was the cause.
But now I know one way to demonstrate the problem.

1) clear the said URL from history.
2) open a new window or tag
3) set View|Character Encoding to UTF-8 (or anything but ISO889-1)
4) type that URL in the address bar and display page
5) you've got the wrong encoding displayed

Obviously, the encoding used must not depend on anything that preexisted.
I cannot say if this covers all cases.

Revision history for this message
Alexander Sack (asac) wrote : Re: [Bug 206884] Re: Firefox uses the wrong display encoding

On Fri, Mar 28, 2008 at 01:58:22AM -0000, André Pirard wrote:
> I've been watching what I was doing before the problem occurs.
> It was always when displaying that URL in a freshly opened window.
> I'm not sure about what in the previous environment was the cause.
> But now I know one way to demonstrate the problem.
>
> 1) clear the said URL from history.
> 2) open a new window or tag
> 3) set View|Character Encoding to UTF-8 (or anything but ISO889-1)
> 4) type that URL in the address bar and display page
> 5) you've got the wrong encoding displayed
>
> Obviously, the encoding used must not depend on anything that preexisted.
> I cannot say if this covers all cases.
>

 status incomplete

Could you please resummarize your current summary and reduce it to the
following points:

1. what is the behaviour you are seeing (one-two sentences)
2. what is the behaviour you are expecting (one-two sentences)
3. what are the steps to reproduce

Thanks,

 - Alexander

Changed in firefox:
status: Confirmed → Incomplete
Revision history for this message
André Pirard (a.pirard) wrote : Re: Firefox uses the wrong display encoding

Sorry to repeat the obvious.

Problem #1

1. As shown in the attached screen shot, http://atilf.atilf.fr/tlf.htm sometimes displays using UTF-8.
2. Obviously, I expected it to display correctly (using ISO 8859-1 default encoding)
3. _when_ it misbehaves is mysterious, but I found one way to reproduce it.
I see no shorter/better way than to repeat what I already wrote :
1) clear the said URL from history [any occurrence of the hostname in address bar dropdown]
2) open a new window or tag [and do the following in there, of course]
3) set View|Character Encoding to UTF-8 (or anything but ISO889-1)
4) type that URL in the address bar and display page
5) you've got the wrong encoding displayed

Problem #2

Which the default (and actual) encodings are seem a very vague definition.
In that case, there's no wonder encoding problems arise.
Historically, I have in mind that it's ISO 8859-1, but :
1. Firefox users should not be allowed to define the default encoding that should be set in a standard.
(all pages should display correctly without a user's hint)
2. It's a design flaw to define the encoding in a HTTP statement instead of in a HTML statement.
(because that vital information is lost when the file is stored on disk or transferred with FTP)

My report couldn't be more complete and precise.

André.

Changed in firefox:
status: Incomplete → Confirmed
Revision history for this message
John Vivirito (gnomefreak) wrote :

Please read the following link about bug states before marking as confirmed.
https://wiki.ubuntu.com/MozillaTeam/Bugs/States?highlight=(mozilla)
marking as incomplete

Changed in firefox:
status: Confirmed → Incomplete
Revision history for this message
Alexander Sack (asac) wrote :

i can confirm that the semantic of the character encoding menu could be improved.

However, unless someone comes up with a good idea, i doubt that this will be fixable for 3.0.

Changed in firefox:
importance: Undecided → Low
status: Incomplete → Confirmed
Changed in firefox-3.0:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
John Vivirito (gnomefreak) wrote :

From what i can tell you are wanting the encoding menu changed if this is accurate this should be filed upstream for firefox-3.0 as 2.0 will be ending support from Mozilla within a month and a half if they stay on schedulle, around end of june or early to mid july as i recall but i would have to look it up becasue it depends on how many blockers 3.0 has and its notally end around same time updated version becomes stable

Revision history for this message
André Pirard (a.pirard) wrote : Re: [Bug 206884] Re: Firefox uses the wrong display encoding

*************************************************************
Considering that the more I write about this problem the less it's
understood, I have introduced a new Bug #228988 where everything has
been rewritten from scratch.
PLEASE READ IT CAREFULLY, it's the last time I write, I have lost enough
time.
I set it to "confirmed", please correct statuses of both as needed ---->>>
For obvious reasons, make this bug a duplicate of Bug #228988 rather
than the opposite.
*************************************************************

On 05/09/2008 07:58 PM, John Vivirito wrote :
> Please read the following link about bug states before marking as confirmed.
> https://wiki.ubuntu.com/MozillaTeam/Bugs/States?highlight=(mozilla)
> marking as incomplete
>
I know why I do that. Some bugs I introduced would have been erased by a
timeout threat because nobody came to set "confirmed" well after I
provided the information they asked.
You probably noticed that I'm willing to cooperate but I hate losing my
time.

Could you explain why this bug was reset to incomplete and then to
complete again without any additional data being added?

On 05/10/2008 02:38 AM, Alexander Sack wrote :
> i can confirm that the semantic of the character encoding menu could be
> improved.
>
I didn't really speak of a menu (which?) and its semantic of a character
encoding.

Did anyone try the procedure I wrote twice?
I suppose that trying is why I was asked to write it a second time.
What are your conclusions?
Firefox should display the page using the wrong encoding.
> However, unless someone comes up with a good idea, i doubt that this
> will be fixable for 3.0.
>
I have the feeling that the fix is amazingly simple for both 2 and 3.
> ** Summary changed:
>
> - Firefox uses the wrong display encoding
> + fuzzy/confusing firefox View -> Character encoding menu semantics
>
The previous title was the correct one.
The practical, most important problem is #1.
The new title reflects Problem #2, the theory of specifying which
encoding a page uses, especially default encoding.
> ** Changed in: firefox (Ubuntu)
> Importance: Undecided => Low
> Status: Incomplete => Confirmed
>
If the bug occurs as often as I have seen it occur, it is of very high
importance.
(especially if easy to fix)
It's been pesting me for months before I made this analysis.
Of course, it occurs only to those needing the accented characters of
ISO88859-1.
(If you search the Web for discussions of this, please search in French).

Thanks for your attention and for your Open devotion.

Revision history for this message
Alexander Sack (asac) wrote :

bug 228988 is now your bug. this one is about what i understood and what i outlined above.

Revision history for this message
Alexander Sack (asac) wrote :

ffox 2 reaches EOL ... so no fix will go there for sure.

Changed in firefox:
importance: Undecided → Unknown
status: New → Unknown
Changed in firefox-3.0:
status: Confirmed → Triaged
Changed in firefox:
status: Confirmed → Won't Fix
Changed in firefox:
status: Unknown → Confirmed
Revision history for this message
André Pirard (a.pirard) wrote : Is this the patch?

The whole thing seems very clear to me.
But, of course, I didn't write the program.
Nor am I writing the documentation.
Or am I?
Is this the patch?

View|Character Encoding mentions the encoding that has been used to
display the page currently displayed. If you change that selection, that
page is redisplayed using the new encoding you select. Of course, the
selection you make applies to the current page only. The list of
encodings is made of user selected entries plus actually used encodings.

View|Character Encoding|More Encodings is used to reach encodings that
are not found in the list (and that, by being used, are temporarily
added to the list).

View|Character Encoding|Customize List... does that for the user
selected entries.

View|Character Encoding|Auto-Detect needs more explanation (and author
validation).
Normally, a HTML page indicates the encoding it uses.
By default (no indication) the encoding is ISO 8859-1.
But Bug #228988 <https://bugs.launchpad.net/bugs/228988>, reports that
Firefox sometimes fails to use the default.
So, why is an Auto-Detect needed if it's clear what encoding to use?
Well, because Web page authors sometimes forget to indicate the encoding.
In that case, Firefox may help you auto-detecting the pages in your
language.
Auto-Detect is a global option that applies in advance to any page that
does not indicate an encoding : it will try to best guess if it should
display such pages with your language's encoding instead of the normal
default (ISO 8859-1).
But other users will continue to display those defective pages badly and
Auto-Detect may get yourself into problems by not displaying ISO 8859-1
correctly.
So, using this option must be done with caution and does not dispense of
asking the Web author to put his pages right. Auto-Detect should be
returned to off after a problem is solved. Otherwise, additional
problems will not be detected.
Auto-Detect Universal : best guess of the best guess is that this means
auto-detect any UTF code.
BTW, all the problems will be over when UTF-8 will be used exclusively
(AP, 1992).

André Pirard (a.pirard)
description: updated
Changed in firefox:
importance: Unknown → Low
Revision history for this message
papukaija (papukaija) wrote :

Just changed the package from firefox-3.0 to the main firefox package in order that this bug appears in the search and doesn't get forgotten.

affects: firefox (Ubuntu) → obsolete-junk
affects: firefox-3.0 (Ubuntu) → firefox (Ubuntu)
Revision history for this message
In , Athira S (athirasnamby) wrote :

I am beginner and I would like to work on this bug.can someone please assingn this bug for me?

Revision history for this message
In , Sunny (darkowlzz) wrote :

Hi Athira,

You need to change "(off)" in [1] to "Universal".

To see the change, you could just build toolkit/locales/ with `mach build toolkit/locales/` and `mach run` to see the changes.

Good Luck!

[1]: http://mxr.mozilla.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/charsetMenu.dtd#10

Revision history for this message
In , Vyv03354 (vyv03354) wrote :

No, no, not at all. "Universal" is going to be killed.
I think this bug is obsolete due to recent massive changes to Character Encoding menu.

Revision history for this message
In , Sunny (darkowlzz) wrote :

(In reply to Masatoshi Kimura [:emk] from comment #14)
> No, no, not at all. "Universal" is going to be killed.
> I think this bug is obsolete due to recent massive changes to Character
> Encoding menu.

In that case, I am letting someone else take over this :)

Revision history for this message
In , Henri Sivonen (hsivonen) wrote :

Bug 805374 made both the Auto-Detect submenu and the Character Encoding menu in general less confusing.

I think we should either consider the menu adequately non-confusing as of Firefox 28 and mark this FIXED or concede that it's not going to become less confusing until/unless we get rid of the menu eventually some day: i.e. WONTFIX. (I think we have a pretty good chance of getting rid of the remaining Russian and Ukrainian options and then coupling Japanese autodetection with Shift_JIS fallback and getting rid of autodetection UI.)

Trying the FIXED interpretation.

Revision history for this message
In , André Pirard (a.pirard) wrote :

I think that the first thing for Character Encoding Autodetect to be less confusing is ti say what it does.
Assuming that it means that any indication of a character set is ignored ans that it is guessed by the contents...
Character Encoding Autodetect is normally not needed because a page MUST specify the encoding it uses.
Using it instead of reporting an error to a webmaster is causing the webmaster to continue to make the same errors.
Also, picking the character code from the HTTP request is an error because the contents of the page MUST specify the encoding, it knows better than an Apache server and the browser won't update the page when it's written to a file.
The only case where character encoding mangling is necessary is when, for example, displaying a text file of which the character set is specified nowhere and, of course, displaying the page correctly before reporting to the webmaster.

Revision history for this message
In , Henri Sivonen (hsivonen) wrote :

(In reply to André Pirard from comment #17)
> I think that the first thing for Character Encoding Autodetect to be less
> confusing is ti say what it does.
> Assuming that it means that any indication of a character set is ignored ans
> that it is guessed by the contents...

It means: If the type of the document is text/html or text/plain and there is no character encoding label on the HTTP layer or inside the document (in the text/html case) and there is no BOM at the start of the document, assume the language of the page is the one selected from the Auto-Detect menu and make a guess based on the contents of the file given that language assumption.

How would you make the menu "say" this?

Note: My current belief is that we don't actually need the Russian and Ukrainian autodetectors. Once the only autodetector we have is the Japanese one, we should probably not bother the user about its existence but couple it with choosing Japanese in Preferences: Content: Advanced: Fallback Character Encoding [or choosing "Default for Current Locale" in the Japanese localization]. Therefore, I think activity to get rid of the Russian and Ukrainian detectors (bug 845791) would be more productive than activity to polish the menu.

> Character Encoding Autodetect is normally not needed because a page MUST
> specify the encoding it uses.

Correct.

> Using it instead of reporting an error to a webmaster is causing the
> webmaster to continue to make the same errors.

Indeed.

> Also, picking the character code from the HTTP request is an error because
> the contents of the page MUST specify the encoding, it knows better than an
> Apache server

Indeed, Ruby's Postulate generally holds. Unfortunately, HTTP disagreed and it's too late to change that, because it would break pages that currently work due to Ruby's Postulate not being true for them.
http://www.intertwingly.net/slides/2004/devcon/69.html

And besides, all browser now agree on the precedence of HTTP over <meta>, so it's not worthwhile to break interoperability.

> and the browser won't update the page when it's written to a
> file.

Firefox is supposed to if you choose the "complete" option in Save As...

> The only case where character encoding mangling is necessary is when, for
> example, displaying a text file of which the character set is specified
> nowhere

Or when displaying an HTML file whose character encoding is specified nowhere. :-(

Changed in firefox:
status: Confirmed → Fix Released
Revision history for this message
In , André Pirard (a.pirard) wrote : Re: [Bug 206884] Firefox Character encoding
Download full text (3.8 KiB)

On 2014-01-21 07:45, Henri Sivonen wrote :
> (In reply to André Pirard from comment #17)
>> I think that the first thing for Character Encoding Autodetect to be less
>> confusing is ti say what it does.
>> Assuming that it means that any indication of a character set is ignored ans
>> that it is guessed by the contents...
> It means: ...
>
> How would you make the menu "say" this?
Language to [auto]detect encoding for
Language to suit [auto-detection]
... something like that

The key hint is to understand that it's a language. I'm the reporter of
this bug and you made me discover this explanation after 5 years. I,
and everybody according to the bug title, had looked for it all over the
place in vain.
Pity there are no HTTP links on system menus. Graphic things (No doc.
Any questions?) badly need it.

Please note that "Universal" is not a language and that I still do not
understand what that means. My guess was that it meant utf-8 but what
would that mean? it's not a language either.
Also, (Off) could be "no encoding auto-detection" to make it very clear
what we're about.

Note: I did not report this problem at all. See Description and read below.
Alexander Sachs changed the subject to mean a problem of his.
Strange doings. I opened another bug to be able say what I meant.
I was accused of saying things that did not happen (but that some 6
other persons met).
I was even accused of tweaking the encoding identification by forcing
the encoding of the preceding page in a test.
As if the encoding of one page influenced the encoding of the next one.
I finally shuddered and turned away to something else.
>> Also, picking the character code from the HTTP request is an error because
>> the contents of the page MUST specify the encoding, it knows better than an
>> Apache server
> Indeed, Ruby's Postulate generally holds. Unfortunately, HTTP disagreed and it's too late to change that, because it would break pages that currently work due to Ruby's Postulate not being true for them.
> http://www.intertwingly.net/slides/2004/devcon/69.html
>
> And besides, all browser now agree on the precedence of HTTP over
> <meta>, so it's not worthwhile to break interoperability.
That is wrong.
MIME was intended to describe the single character set of a file that
does not provide it.
HTML self-describes it and can contain many character sets that MIME is
unable to describe.
It's like saying "he speaks English" of someone who says "Je parle
français ik spreek vlaams и я говорю по русский"
>> and the browser won't update the page when it's written to a
>> file.
> Firefox is supposed to if you choose the "complete" option in Save As...
Right and you made me notice it.
But why is it correct with a "complete" page and surprisingly incorrect
with a "HTML" one?
In fact, I met so many character handling bugs in my life that I no
longer care reporting anything.
Like that craze of removing http:// from Firefox URL bar. This caused a
tons of bugs and I still have a stock of 12 or so. Why the hell do that
when it was going so well, everyone in the street knew what http:// was
and started asking why one removed it?

>> The only case where character encoding mangling is necessa...

Read more...

Revision history for this message
In , Henri Sivonen (hsivonen) wrote :

(In reply to Launchpad from comment #19)
> André Pirard added the following comment to Launchpad bug report 206884:
> > How would you make the menu "say" this?
> Language to [auto]detect encoding for
> Language to suit [auto-detection]
> ... something like that

That's unusually wordy for a submenu label. :-(

> Please note that "Universal" is not a language and that I still do not
> understand what that means.

The people who understood what it *really* meant can probably be counted with the fingers of one hand. That's one of the reasons why I removed that item from the menu.

> Also, (Off) could be "no encoding auto-detection" to make it very clear
> what we're about.

I wouldn't be opposed to re-labeling "(Off)" to something like "No Detection".

Revision history for this message
In , André Pirard (a.pirard) wrote :

> That's unusually wordy for a submenu label.
Striving to use a single word instead of three is indeed a usual,
prominent reason why people don't understand what the software does. As
well as putting menu entries where the programmer thinks they must be
and not wondering where the user will look for them. Putting the same
menu entry in several places is not idiot at all if the user is liable
to look for it in different places. Yet, I don't remember having ever
seen that.
A typical case is the Firefox Preferences menu which is under Tools in
Windows and Edit in Unix. Most of the time, support personal knows
absolutely nothing about that that and you guessed that the discussions
can be very funny. I would put Preferences under both.
Unusual does not mean Inappropriate. It even often means Progress.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.