Firefox crashes in flag_qsort during spellchecker initialization on x86 due to gcc bug

Bug #1322784 reported by L. David Baron on 2014-05-24
68
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Critical
firefox (Ubuntu)
Undecided
Unassigned
gcc-4.8 (Ubuntu)
Undecided
Unassigned

Bug Description

The most common Firefox crash on Linux in Mozilla's crash-stats system is crashes in the function flag_qsort.

These crashes occur:
 * only on x86 architecture
 * only on Ubuntu packages (and not on Mozilla's builds)
 * on precise and saucy and trusty (based on kernel versions reported with the crashes)
and appear to be due to a compiler bug in the compiler used to generate Ubuntu's builds. (It could be a common compiler bug triggered by different compiler options or a compiler bug specific to Ubuntu's gcc.)

The analysis that leads to the conclusion that this is a compiler bug is in https://bugzilla.mozilla.org/show_bug.cgi?id=983817 . In particular, the compiler is miscompiling an access to an element of an array of unsigned short as a 32-bit read, and when the unsigned short in question is the last one in the allocation and that allocation is aligned so that the byte following has a different 0x100000 bit, this can lead to crashes.

The most recent (whenever you follow the link) 7 days of crash reports are available at: https://crash-stats.mozilla.com/report/list?signature=flag_qsort&product=Firefox&query_type=contains&range_unit=weeks

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release)
Build ID: 20140218140052

Steps to reproduce:

Clicked on a text field on github's web site and hit ctrl + v to paste a long public key into the text field, very quickly.

Actual results:

Browser stopped responding and crashed after a minute.

Expected results:

Text pasted; browser not crash.

I saw this happen; I'm wondering from the crash report if it may have been Firefox trying to spell-check the public key. Or, it may have something to do with text fields in a form and switching fields quickly.

I was directed to this report from this crash report [1] on my system.

On my system, firefox is crahsing regularly (several times a day!) when I click a text field, or when I change the spell-check language. This is not systematic though. I am using 2 different languages in FF text fields, and I switch between them several times a day (maybe 20 times?). About 10% of the time FF will crash. This is highly unreliable. As a consequence I systematically copy/paste the content of the text field before I switch spell-check language (as the crash lose the last edit). This is truly frustrating.

This bug appeared somewhere around FF26-FF27 about. It used to work fine before.

[1] https://crash-stats.mozilla.com/report/index/e043de51-0fc8-4ddb-93ae-ab7c82140331

*** Bug 995356 has been marked as a duplicate of this bug. ***

From the comments in other crashes, this may be an issue with focusing on a form's text field rather than a spelling check error. The crashes only happen on Linux and is still happening in builds for Firefox 29 from the end of April. There have been around 600 crashes with this signature in the last 7 days

More crash reports: https://crash-stats.mozilla.com/report/list?signature=flag_qsort&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-05-03+21%3A00%3A00&range_value=1#tab-sigsummary

This is a crash in the hunspell code we use to spell check.

To people who can reproduce: what dictionaries do you have installed? Does someone have steps to reproduce?

I have French, English (UK) and English (US)

Thanks! It would be nice if someone can try installing those dictionaries and get us some steps to reproduce.

Unfortunately I have no steps to reproduce. On my system it is pretty random, which makes the issue even more frustrating (and a real pain). "May I click this field? Will FF crash this time? Or should I first copy the field content in the clipboard?". See also my comment in bug 995356.

(In reply to comment #9)
> Unfortunately I have no steps to reproduce. On my system it is pretty random,
> which makes the issue even more frustrating (and a real pain). "May I click
> this field? Will FF crash this time? Or should I first copy the field content
> in the clipboard?". See also my comment in bug 995356.

One thing that you can try is to disable the English (UK) and French dictionaries one by one and see if disabling one of them will make the crash go away. I strongly suspect that this is due to a corrupted dictionary file.

How can I disable dictionaries?
(I can't remember how I got them here)

(In reply to comment #11)
> How can I disable dictionaries?
> (I can't remember how I got them here)

If you go to about:addons, do you see them listed either under Dictionaries or Extensions?

(In reply to comment #12)
> If you go to about:addons, do you see them listed either under Dictionaries
> or Extensions?

no.
There is one French dictionary listed, but it is disabled.

The fact that the crash addresses all end in five zeros is highly suspicious; is flag_qsort reading a word further than it ought to, and thus intermittently crossing a page boundary when the array it's sorting bumps up against the edge of that page?

So, flags_qsort expects begin to be the first index and end to be 1 greater than the last index. (It thus does more work than needed on one element arrays.) During the loop it maintains the invariants that all values in the range [begin + 1, l) are less than pivot and all values in [r, end) are greater than pivot. These ranges both might be empty. It exits the loop when l == r, which ensures that l is always a valid index; r might be equal to end and thus not a valid index.

Then, after the loop, l is set to one less than r. If begin + 1 == end, then l == begin and r == end, since the while loop was never entered.

So the code of flag_qsort itself looks ok to me, or at least if there's a problem, I haven't seen it. It seems somewhat unlikely for the compiler to misoptimize.

The caller that matters is HashMgr::load_tables, which uses an allocation made in decode_flags, which does the allocation and gives the caller both the pointer and he length, so it looks like it's passing the right size as well.

Integer overflow seems somewhat unlikely, although I suppose it's possible.

I really wish the crash reports had at least line number information.

The format of the .dic file is basically like this:

<N> # denoting the number of lines
... # followed by N lines

And hunspell doesn't perform any bounds checking on the contents of the file, and in the past I've seen at least one crash which was caused by a dictionary file which got this wrong.

Something like this would be my first guess as to what's happening here.

This is what I get from my hunspell dictionaries:

$ cd /usr/share/hunspell
$ ls *.dic
en_GB.dic en_US.dic fr.dic

$ head -1 en_GB.dic && wc -l en_GB.dic
56506
56507 en_GB.dic

$ head -1 en_US.dic && wc -l en_US.dic
62154
62155 en_US.dic

$ head -1 fr.dic && wc -l fr.dic
63062
63063 fr.dic

Can you please tar up the .dic and .aff files there and attach it to the bug, and perhaps include some links to pages where you experience this crash?

Created attachment 8419869
user-share-hunspell.tgz

Sure.

Here is a tarball of my /usr/shar/hunspell directory.

I remember that I removed manually the dictionaries that I did not wanted (like French local variants).

fr.* comes from ubuntu package hunspell-fr
en_US.* comes from ubuntu package hunspell-en-us
en_GB.* comes from ubuntu package myspell-en-gb

The pages where I experience crash are pretty random, as far as I recall. Of course frequently visited pages produce most crashes, like gmail or facebook just to name two.

(can you see the crashing page from my crash reports?)

Are all the people seeing this using Ubuntu's builds of Firefox (or some other distro?), or are any of you seeing this in Mozilla-generated builds? (The build IDs in crash stats don't match seem to match our builds.)

https://crash-stats.mozilla.com/report/index/f84c892b-3d01-46f5-aa98-e64d92140501 shows (using minidump_stackwalk):

Thread 0 (crashed)
 0 libxul.so + 0x15ecb6e
    eip = 0xb4bb5b6e esp = 0xbfbf43d0 ebp = 0x8e5ffffe ebx = 0xb6dcaef4
    esi = 0x00000002 edi = 0x82a14014 eax = 0x00000000 ecx = 0xb6dc532a
    edx = 0x00000001 efl = 0x00210282

(didn't get symbols set up)

This is consistent with the disassembly of
http://mirrors.kernel.org/ubuntu/pool/main/f/firefox/firefox_29.0+build1-0ubuntu0.13.10.3_i386.deb
(that's the package for saucy), but not consistent with the disassembly of the Mozilla official Firefox 29 build.

Created attachment 8419927
function disassembly from Mozilla's official Firefox 29 build

Created attachment 8419929
function disassembly from Ubuntu's saucy package

Created attachment 8419938
annotated disassembly from Ubuntu's saucy package

This shows where the bug is. At the time of the crash we're loading flags[l] with a 32-bit read in order to compare it to pivot.

Comparing with the registers in comment 23, we can see that:
  l == 1 (%edx)
  r == 2 (%esi)
  &flags[l] == 0x8e5fffe (%ebp)
  begin == 0 (%eax)
  pivot == 0x532a (%cx)

In any case, I think this is pretty clearly a compiler bug. Not sure who we bug about such things, given that it's in Ubuntu's builds. (From the kernel versions I saw, it looks like it's showing up for precise and saucy.)

Who's the right person to talk to if the most frequent Firefox crash on Linux is a bug in the compiler used to build the Ubuntu packages?

(Note that I've made no attempt so far to reduce a testcase for the compiler bug. It might or might not be fixed in trunk gcc, and I don't know what gcc options are needed. I'm not going to have the time to do so, either.)

https://crash-stats.mozilla.com/report/list?signature=flag_qsort&product=Firefox&query_type=contains&range_unit=weeks is a link to the (current, not fixed in time) most recent week of crashes with this signature

... oh, and there are definitely trusty kernel versions in that list as well.

(In reply to David Baron [:dbaron] (needinfo? me) (UTC-7) from comment #28)
> Who's the right person to talk to if the most frequent Firefox crash on
> Linux is a bug in the compiler used to build the Ubuntu packages?

maybe you could be in touch with the ubuntu firefox package maintainer

(I am using a mainstream kernel, 3.14.0-031400rc6-generic)

(In reply to pionchon.luc from comment #31)
> maybe you could be in touch with the ubuntu firefox package maintainer

I *think* that's who I made the needinfo request to in comment 28.

Changed in firefox:
importance: Unknown → Medium
status: Unknown → Confirmed

Hi,

I'll talk to our compiler guy about this. It probably won't be me who looks at it as I'm working on another project now.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in firefox (Ubuntu):
status: New → Confirmed
Changed in gcc-defaults (Ubuntu):
status: New → Confirmed
Chris Coulson (chrisccoulson) wrote :

Matthias, any chance you could look at this?

affects: gcc-defaults (Ubuntu) → gcc-4.8 (Ubuntu)

Could somebody running the Firefox that has this crash (i.e., 32-bit Ubuntu packages) attach the contents of about:buildconfig to this bug? (That is, just type "about:buildconfig" in the URL bar, save it to a file, and use the "Add an attachment" link above.)

Er, never mind, I can extract it from the package in comment 23.

We ought to be able to work around this, and probably should, given the lack of response.

If somebody has a setup that can reproduce the compiler bug, there might be a straightforward workaround such as inserting |volatile| somewhere or similar trivial rearrangement of code.

If not, we ought to be able to pad these arrays by 2 bytes, #ifdef linux and gcc.

Chris Coulson (chrisccoulson) wrote :

We're now hitting what's probably a crash during startup cache compilation in Firefox 32, specific to gcc-4.8 x86 builds: https://launchpadlibrarian.net/180577186/buildlog_ubuntu-trusty-i386.firefox_32.0~b1%2Bbuild1-0ubuntu0.14.04.1_FAILEDTOBUILD.txt.gz

Chris Coulson (chrisccoulson) wrote :

Actually, forget that - this bug isn't specific to gcc 4.8 anyway if it's occurring on precise. So, most definitely another issue. Sigh

Created attachment 8464679
about:buildconfig.html

Buldconfig info from my Firefox, which I think is affected by this bug.

Great to see progress.

#6 crash for Thunderbird 31, and for a linux crash to have such a high rank in all of Thunderbird is a pretty big deal. Typical crash comment is similar to prior comments in the bug - changing the spell-check language.

Some linux users with this crash signature also see signature g_list_sort_with_data but the stack is different. For example bp-c571441b-ee2b-4622-a4e5-ed3532140809 "Upon closing Thunderbird it crashed." Different bug?

Changed in firefox:
importance: Medium → Critical
Matthias Klose (doko) on 2014-08-15
Changed in gcc-4.8 (Ubuntu):
status: Confirmed → New
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gcc-4.8 (Ubuntu):
status: New → Confirmed
Matthias Klose (doko) wrote :

invalid according to comment #42

Changed in gcc-4.8 (Ubuntu):
status: Confirmed → Invalid
L. David Baron (dbaron) wrote :

All comment 42 says is that comment 41 is a separate issue from this bug. That doesn't make this bug invalid.

Changed in gcc-4.8 (Ubuntu):
status: Invalid → Confirmed
Matthias Klose (doko) wrote :

then please point out why this is a gcc-4.8 issue

Changed in gcc-4.8 (Ubuntu):
status: Confirmed → Incomplete
L. David Baron (dbaron) wrote :

It's an issue with something in the compilation toolchain that Ubuntu uses to compile the Firefox builds that Ubuntu ships. I don't know what part of that toolchain specifically (whether it's base gcc or Ubuntu's gcc modifications or wrappers). What's the right place to put such bugs? It's by far the most frequent crash affecting Firefox on Ubuntu.

Created attachment 8561105
Pad heap allocations passed to flag_qsort() on x86 Linux to work around gcc bug affecting Ubuntu packages

My biggest concern for review of this patch is whether the #ifdef will
correctly catch what Ubuntu is using to compile Firefox. Does anybody
know how to confirm that Ubuntu is compiling with gcc, and that these
#ifdefs are correct?

Comment on attachment 8561105
Pad heap allocations passed to flag_qsort() on x86 Linux to work around gcc bug affecting Ubuntu packages

Hmmm, given that ehsan's away for a bit, transferring review to froydnj.

(I'd hope to get this in to beta, although I really should have tried to do this many releases ago, although I was hoping somebody else would.)

Comment on attachment 8561105
Pad heap allocations passed to flag_qsort() on x86 Linux to work around gcc bug affecting Ubuntu packages

Review of attachment 8561105:
-----------------------------------------------------------------

Yuck. The #ifdef checks are correct, fwiw.

Changed in firefox:
status: Confirmed → In Progress

Comment on attachment 8561105
Pad heap allocations passed to flag_qsort() on x86 Linux to work around gcc bug affecting Ubuntu packages

Approval Request Comment
[Feature/regressing bug #]: not a regression in our codebase
[User impact if declined]: #3 topcrash on Linux, specific to 32-bit Ubuntu-distributed builds. Firefox will randomly crash on 32-bit Linux builds the first time the user uses a textarea or otherwise does something that initializes the spellchecker. (It only crashes a small percentage of the time, but it affects a large number of users.)
[Describe test coverage new/current, TreeHerder]: None. Just landed on mozilla-inbound. I don't know of any way to test that the fix works without shipping it on the release channel.
[Risks and why]: Low risk; it's padding a few allocations in the spellcheck code with 2 extra bytes on all 32-bit Linux builds.
[String/UUID change made/needed]: no

CCing the Hunspell maintainers to make sure this patch gets upstreamed as well.

Changed in firefox:
status: In Progress → Fix Released

magnus, it seems like we may want this for ESR. Do you agree?

#5 crash for https://crash-stats.mozilla.com/topcrasher/products/Thunderbird/versions/31.4.0

Comment on attachment 8561105
Pad heap allocations passed to flag_qsort() on x86 Linux to work around gcc bug affecting Ubuntu packages

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: top crasher.

See comment 43 for risk evaluation. (Low)

Comment on attachment 8561105
Pad heap allocations passed to flag_qsort() on x86 Linux to work around gcc bug affecting Ubuntu packages

Accepting for ESR 31 since it's a topcrash and affects many users.

I am not seeing this in esr TB31.6.0 v.fixed and thanks for the uplift

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.