SIGSEGV in nsViewManager::UpdateWidgetsForView

Bug #320459 reported by petski
4
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Critical
firefox-3.0 (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: firefox-3.0

I'm running 3.0.5+nobinonly-0ubuntu0.8.10.1 on intrepid. Just had a segmentation fault in firefox.

Reproduce:
- Go to http://pv.caiw.nl/pvfotos/fotos.php?ordner=30-10-2008%20-%20Kwartaalborrel%20USA
- Wait for page to be fully loaded
- Click on the first image
- Press right arrow on your keyboard (maybe a couple of times), to scroll through the images
- SIGSEGV

Attached you'll find the stacktrace.

Unfortunately, http://crash-stats.mozilla.com times out when I query, so I can't see if I'm alone :).

Revision history for this message
In , Martijn-martijn (martijn-martijn) wrote :

This seems to have become worksforme, in current trunk build.

Revision history for this message
In , Martijn-martijn (martijn-martijn) wrote :

Created attachment 331390
testcase2

I guess I was wrong, I can still see this crash occuring with this testcase (also uses enhanced privileges).
http://crash-stats.mozilla.com/report/index/3335720f-5ab3-11dd-90b2-001a4bd43ef6?p=1

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 331510
Stack: frame destruction during ScrollPointIntoView

The problem is that nsViewManager::Composite() flushes pending notifications
and thus destroys arbitrary layout objects.
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/generic/nsSelection.cpp&rev=3.311&root=/cvsroot&mark=5549,5553#5543
(in this case the whole shell, and thus all its frames and their views -
one of the views we're currently scrolling on the call stack)

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 331511
Patch A, rev. 1

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 331513
Stack: "GetPrimaryFrameFor() called while nsFrameManager is being destroyed!"

After fixing the crash we still get warning (printf):
GetPrimaryFrameFor() called while nsFrameManager is being destroyed!
from this point:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/base/nsPresShell.cpp&rev=3.1117&root=/cvsroot&mark=5941#5907
We have called Destroy() for the current (this) shell in this case
so we shouldn't call GetPrimaryFrameFor() (from GetCurrentEventFrame()).

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 331516
Patch B, rev. 1

Fixes the GetPrimaryFrameFor() warning.
BTW, should we make that an NS_ERROR/WARNING rather than a DEBUG printf?
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/base/nsFrameManager.cpp&rev=1.259&root=/cvsroot&mark=330#322
(let me know if you want to handle this issue in a separate bug)

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

Make it an NS_WARNING, yes please. This patch or a new bug is fine.

Should we always return null from GetCurrentEventFrame if mIsDestroying (even if mCurrentEventFrame is non-null)?

Revision history for this message
In , Jst (jst) wrote :

Mats, any updates here?

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Comment on attachment 331511
Patch A, rev. 1

The combined patch still crashes unfortunately...
I'm looking in to it.

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 346898
Stack: deleting nsView while in use by UpdateWidgetsForView()

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 346899
Stack: nsAutoScrollTimer::Notify() using destroyed pres context

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 346900
Stack: nsScrollPortView::ScrollTo() using destroyed nsView

Probably as a result of "deleting nsView while in use by UpdateWidgetsForView()"
(got this one on MacOSX, the other one on Linux)

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 346918
Patch rev. 2
[Checkin: Comment 18 & 19]

This patch includes the earlier two patches, with roc's nit in comment 7.

Additionally,
1. add "StopAutoScrollTimer()" in DisconnectFromPresShell(), this will
   Cancel an outstanding timer and is necessary since it uses a raw
   pres context pointer (and other pointers) which could be stale.
2. use nsWeakView in UpdateWidgetsForView() to detect view destruction
3. keep strong refs on the pres shell and view manager over the
   viewManager->Composite() call, and give up if presShell->IsDestroying()
   after it.

I have high hopes for the UpdateWidgetsForView() fix - I think that's
a bug I've been hunting for years...

Tested on Linux, MacOSX and XP. There are still a few assertions for
the first testcase, but nothing too serious, they can be fixed separately.

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

Comment on attachment 346918
Patch rev. 2
[Checkin: Comment 18 & 19]

It's possible for notifications to *move* views, which could make UpdateWidgetsForView behave strangely, but since it can only move views once I don't think it's possible to get into an infinite loop.

Revision history for this message
In , Bsterne (bsterne) wrote :

There's a reviewed patch here. Can we get this landed ASAP as this is on our "Top Security Bugs" list?

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 351805
crashtest.diff (needs privileges though, so can't be used yet)

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 351825
mochitest.diff

Mochitest version of crash tests, it just loads them into an iframe and let
it run for 2 seconds - a bit lame but better than nothing I guess...

Revision history for this message
In , Mats Palmgren (matspal) wrote :

http://hg.mozilla.org/mozilla-central/rev/6efa9c970d64
http://hg.mozilla.org/mozilla-central/rev/f7d4bb87cda0

I also landed the mochitests briefly:
http://hg.mozilla.org/mozilla-central/rev/7effc03f2f45
but it caused Tinderbox orange "Unable to restore focus, expect failures and timeouts." so I backed it out:
http://hg.mozilla.org/mozilla-central/rev/ae945d36888c

I filed bug 468390 for the remaining assertion for the first testcase.
I don't get any significant assertions for the 2nd testcase (on Linux).

-> FIXED

Revision history for this message
In , Mats Palmgren (matspal) wrote :
Revision history for this message
In , Sgautherie-bz (sgautherie-bz) wrote :
Revision history for this message
In , Sgautherie-bz (sgautherie-bz) wrote :

(In reply to comment #20)

It seems we both updated the bug at the same time ;->

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Comment on attachment 346918
Patch rev. 2
[Checkin: Comment 18 & 19]

Low risk. No performance impact.

Revision history for this message
In , Dveditz (dveditz) wrote :

Unless we know of a non-privileged way to trigger this crash it's more sg:moderate than potentially sg:critical. I guess maybe you could get the user to play a mouseclicking/dragging game and then toggle the iframe during that?

Revision history for this message
In , Dveditz (dveditz) wrote :

Comment on attachment 346918
Patch rev. 2
[Checkin: Comment 18 & 19]

Approved for 1.9.0.6, a=dveditz for release-drivers.

Revision history for this message
In , Dveditz (dveditz) wrote :

I guess worst-case-scenario thinking: we don't know that we _can't_ trigger this without privs, so it _is_ potentially critical (thus the '?').

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 354486
Patch for 1.9.0

The fix requires nsIPresShell::IsDestroying() which isn't
present in 1.9.0 (it was added as part of bug 455984).
This patch adds it and changes the UUID for nsIPresShell
(no other changes).

What's the policy for UUID changes on the 1.9.0 branch?
Should I add a nsIPresShell_MOZILLA_1_9_0_BRANCH subclass
like we did in the 1.8 branch?

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

You should just not change the UUID here. The new method is not virtual and will not break any users of the interface, so I think we can get away with just not changing the UUID. It's a bit of a cheat but the only alternative is to create nsIPresShell_MOZILLA_1_9_0_BRANCH which sucks.

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Ok, I think that will work too.
An alternative is to make mIsDestroying public and access it directly.

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Created attachment 354527
Patch for 1.9.0 (no UUID change)

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Comment on attachment 354527
Patch for 1.9.0 (no UUID change)

See comments above regarding the UUID cheat.

Revision history for this message
In , Dveditz (dveditz) wrote :

Comment on attachment 354527
Patch for 1.9.0 (no UUID change)

Approved for 1.9.0.6, a=dveditz for release-drivers.

Revision history for this message
In , Marcia-mozilla (marcia-mozilla) wrote :

verified fixed on the 1.9.1 branch using Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b3pre) Gecko/20090106 Shiretoko/3.1b3pre. No crash with the testcase.

Revision history for this message
In , Mats Palmgren (matspal) wrote :

Comment on attachment 354527
Patch for 1.9.0 (no UUID change)

Checked in on CVS trunk for 1.9.0.6:
mozilla/layout/base/nsPresShell.cpp 3.1121
mozilla/layout/base/nsIPresShell.h 3.223
mozilla/layout/generic/nsFrameSelection.h 1.29
mozilla/layout/generic/nsSelection.cpp 3.312
mozilla/view/src/nsViewManager.cpp 3.486

Revision history for this message
In , Abillings (abillings) wrote :

Verified for 1.9.0.6 using testcase and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6pre) Gecko/2009011304 GranParadiso/3.0.6pre.

Revision history for this message
In , Tchung-mozilla (tchung-mozilla) wrote :

sorry for the inconvenience. i meant to detect any bugs before the branch merge that were still RESO fixed, but had a verified1.9.1 keyword next to them.

Revision history for this message
In , Tchung-mozilla (tchung-mozilla) wrote :

Verified on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090115 Minefield/3.2a1pre

Revision history for this message
petski (petski) wrote :

Binary package hint: firefox-3.0

I'm running 3.0.5+nobinonly-0ubuntu0.8.10.1 on intrepid. Just had a segmentation fault in firefox.

Reproduce:
- Go to http://pv.caiw.nl/pvfotos/fotos.php?ordner=30-10-2008%20-%20Kwartaalborrel%20USA
- Wait for page to be fully loaded
- Click on the first image
- Press right arrow on your keyboard (maybe a couple of times), to scroll through the images
- SIGSEGV

Attached you'll find the stacktrace.

Unfortunately, http://crash-stats.mozilla.com times out when I query, so I can't see if I'm alone :).

Revision history for this message
petski (petski) wrote :
Revision history for this message
Peter Veerman (pveerman) wrote :

I confirm this issue and I am running I'm running 3.0.5+nobinonly-0ubuntu0.8.10.1 on intrepid too.

Changed in firefox-3.0:
status: New → Confirmed
Revision history for this message
In , Alexander Sack (asac) wrote :

we don't see any crash on 1.8. Is this a 1.9 regression?

Revision history for this message
petski (petski) wrote :

I've found a patch for this issue in https://bugzilla.mozilla.org/show_bug.cgi?id=421839

I've used Launchpad's PPA to rebuild xulrunner-1.9 (debdiff attached). Indeed, no more SIGSEGV! There is a slightly different behavior for the site in the description of this bug though: the page doesn't update anymore after hitting a keystroke. Don't know if this is caused by firefox or by buggy javascript (I guess the latter)

Instructions to use my PPA:

$ sudo sh -c 'echo deb http://ppa.launchpad.net/petski/ubuntu intrepid main > /etc/apt/sources.list.d/lp320459.list'
$ sudo sh -c 'echo deb-src http://ppa.launchpad.net/petski/ubuntu intrepid main >> /etc/apt/sources.list.d/lp320459.list'
$ gpg --recv-keys ADC679189D1D1B6701310DDDDAA9427DCBF475C3
$ gpg --export --armor ADC679189D1D1B6701310DDDDAA9427DCBF475C3 | sudo apt-key add -
$ gpg --batch --delete-key ADC679189D1D1B6701310DDDDAA9427DCBF475C3
$ sudo apt-get update
$ sudo apt-get upgrade

Changed in firefox-3.0:
status: Confirmed → Fix Committed
Changed in firefox:
status: Unknown → Fix Released
Revision history for this message
In , Alexander Sack (asac) wrote :

for now marking as not wanted on 1.8 branches. if you think its a real issue there please flip back the wanted flags. Thanks!

Revision history for this message
petski (petski) wrote :

A fix for bug 421839 is included in FF3.0.6. Haven't got the time to test it though. Once tested, I will let you guys 'n girls know the outcome.

Revision history for this message
petski (petski) wrote :

As expected, this bug is fixed in FF3.0.6

Changed in firefox-3.0:
status: Fix Committed → Fix Released
Revision history for this message
In , Bclary (bclary) wrote :
Revision history for this message
In , Bclary (bclary) wrote :

(In reply to comment #39)
> crash test added

actually was a mochitest sorry.

Revision history for this message
In , Bclary (bclary) wrote :

disable due to timeout and dupe call to SimpleTest.finish.
http://hg.mozilla.org/mozilla-central/rev/abc59dbb6e46

I'll fix and reenable later.

Changed in firefox:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.