Ubuntu

MASTER Firefox crashes on instant X server shutdown

Reported by kfinity on 2006-11-27
222
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Confirmed
Medium
firefox-3.0 (Ubuntu)
Medium
Unassigned
firefox (Ubuntu)
Medium
Unassigned

Bug Description

Firefox crashes when X server is forcefully torn down (e.g. by pressing ctrl-alt-backspace) and a crash report gets generated on next login.

(Original Report:
I've reproduced it once on my machine with the following steps. With, oh, about 5 tabs open, I just pressed ctrl-alt-backspace, logged back in when the X server restarted, and FF crashed with a bug report when Gnome tried to restore the session.

It's not terribly important, I don't think anyone does this very often, but maybe it'll be helpful.
)

I don't know a lot about unix signal handling, but don't unix signals need to be handled synchronously? Or can we watch for SIGTERM, call nsIAppStartup.quit(), and return from the signal handler immediately, allowing the eventloop and stack to unwind gracefully later?

Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved work/multiple tabs), or eForceQuit?

I believe SIGTERM can be synchronous or asynchronous. It's a signal that can by hanled by the target process, with the basic intent of: "Please terminate yourself, whatever that may entail". This is in contrast to SIGKILL, which cannot be handled by the target process: "Hey kernel, immediately terminate that process over there".

For example, IIRC, in the case of SIGTERM Apache returns from the signal, and then goes about shutting down it's child processes.

If the app is left running when the machine is shut down then shutdown actually hangs. In practice the OS steps in and asks the user what to do about the app which is not responding (Firefox 2).

I've seen this problem with 10.4.8 and 10.3.9

I've reproduced it once on my machine with the following steps. With, oh, about 5 tabs open, I just pressed ctrl-alt-backspace, logged back in when the X server restarted, and FF crashed with a bug report when Gnome tried to restore the session.

It's not terribly important, I don't think anyone does this very often, but maybe it'll be helpful.

FWIW: That's the same on Linux.
People are closing their session often w/o closing every single application.

(In reply to comment #1)
> I don't know a lot about unix signal handling, but don't unix signals need to
> be handled synchronously? Or can we watch for SIGTERM, call
> nsIAppStartup.quit(), and return from the signal handler immediately, allowing
> the eventloop and stack to unwind gracefully later?

You can watch for SIGTERM since that signal should do the same as a normal exit. So you can catch the signal, start the shutdown process and return from the signal handler.

> Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved
> work/multiple tabs), or eForceQuit?

That's a harder question and I think there is no rule for that. I tend to vote for eForceQuit since if the application gets a SIGTERM the user may not be in front of the application (or display) at all.

I also just restarted X with Ctrl-Alt-backspace, and when I logged back in, there was a crash report for Firefox. However, unlike the original submitter, Firefox isn't automatically restored by my gnome session, so it is probably the Firefox that was open before I restarted X that crashed.

Firefox restarted fine, and restored my session, so no biggy really.

Can not reproduce. KDE 3.5.6 on Herd 2. Fx 2.0

Changed in firefox:
status: Unconfirmed → Needs Info
David Farning (dfarning) on 2007-01-31
Changed in firefox:
importance: Undecided → Medium

Martin is right, this is most likely a report generated for the instance running when X server is shut down. Retitled and updated description accordingly.

However, its really minor since forcing X meld down is rather an exceptional operation.

description: updated
John Vivirito (gnomefreak) wrote :

Taking for retrace.

Changed in firefox:
assignee: nobody → gnomefreak
John Vivirito (gnomefreak) wrote :

Finished retrace. Let me know if you need more.

Changed in firefox:
assignee: gnomefreak → mozillateam
David Farning (dfarning) on 2007-02-24
Changed in firefox:
assignee: mozillateam → mozilla-bugs
Axel (axelbrink) wrote :

Same here. I upload my crash report.

There is a bug report in ubuntu related to this bug. I'm just providing a link to it here:
https://bugs.launchpad.net/firefox/+bug/73536

Alexander Sack (asac) wrote :

this is known upstream. Someone should dig the right bug number out of bugzilla.

Changed in firefox:
status: Needs Info → Confirmed
Changed in firefox:
status: Unknown → Confirmed
Alexander Sack (asac) wrote :

thank you for upstream bug, TrioTorus.

Changed in firefox:
status: Confirmed → In Progress

*** Bug 381828 has been marked as a duplicate of this bug. ***

(In reply to comment #4)
> > Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved
> > work/multiple tabs), or eForceQuit?
>
> That's a harder question and I think there is no rule for that. I tend to vote
> for eForceQuit since if the application gets a SIGTERM the user may not be in
> front of the application (or display) at all.

I also think eForceQuit is better, although it would be fine to have at least some SIGTERM handling at all. I often just start ./firefox from the commandline when developing my extension and ctrl-c it when done with the result of getting a "do you want to restore tabs" message at next start :(

(In reply to comment #7)
> (In reply to comment #4)
> > > Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved
> > > work/multiple tabs), or eForceQuit?
> >
> > That's a harder question and I think there is no rule for that. I tend to vote
> > for eForceQuit since if the application gets a SIGTERM the user may not be in
> > front of the application (or display) at all.
>
> I also think eForceQuit is better, although it would be fine to have at least
> some SIGTERM handling at all. I often just start ./firefox from the commandline
> when developing my extension and ctrl-c it when done with the result of getting
> a "do you want to restore tabs" message at next start :(
>

There is sent SIGINT on Ctrl+C instead of SIGTERM.

(In reply to comment #8)
> There is sent SIGINT on Ctrl+C instead of SIGTERM.

Right, my mistake. But usually it is handled the same as SIGTERM in most UNIX
programs, and so should Firefox to also "feel" like a true UNIX program which
can also be run/terminated from the command line.

Is this bug really restricted to Mac OS?

Running Debian GNU/Linux, I find it annoying that when I shutdown my computer via my window manager, Firefox thinks it crashed and asks me if I want to restore tabs the next time I boot up my computer. So I suppose I think that eAttemptQuit sounds better -- when you shut down your computer normally, you don't expect your programs to think that they crashed, rather you want them to exit normally. When you shut down your computer normally, it's okay for applications to ask you if you want to save your work.

My vote: SIGTERM should force quit. I just used "kill" because Firefox was doing its spinlocking thing and taking 5min to get to each window to ask if I really wanted to close it. (I'll file a separate bug if that doesn't exist... quit should ask ONCE, not once for every window.) Anyway, leaving aside the CPU usage problem, "kill" bypasses all that. It should definitely force a quit.

However: when it receives a SIGTERM it should NOT ask me on next start that it quit "unexpectedly". No, I killed you on purpose. Don't make me kill you again.

@comment 11: I think 'kill' should do the same as just clicking the "X" in most window managers. For a really forceful quit when firefox is hanging in an infinite loop we still have 'kill -9'

*** Bug 459282 has been marked as a duplicate of this bug. ***

Alexander Sack (asac) on 2008-10-31
Changed in firefox:
assignee: mozilla-bugs → nobody
status: In Progress → Triaged
Changed in firefox-3.0:
importance: Undecided → Medium
status: New → Triaged
mmware (massimo-mmware) wrote :

well, if you are talking about kde/firefox integration, this is a work-around :

http://monda.hu/blog/2007/07/25/how-to-make-firefox-restore-your-session-under-linux/

Micah Gersten (micahg) wrote :

No more fixes for Firefox 2.

Changed in firefox (Ubuntu):
status: Triaged → Won't Fix
Changed in firefox:
importance: Unknown → Medium
meven (meven29) wrote :

This bug still occurs as it has for years.
This bug doesn't deserve to finish in the bug's graveyard, forgotten and denied by every one.

The "Won't Fix" status is not encouraging.

Six years later... Firefox 10 has the same bug Firefox 2 had. It is ridiculous that Firefox shows the "This is embarrassing" message when it receives a normal SIGTERM. People are hacking around the problem by disabling session restore completely:

http://www.lifehacker.com.au/2009/09/getting-rid-of-firefoxs-well-this-is-embarrassing-message/

Hasn't there been enough embarrassment? Let's fix this.

ok, fix it! I'd happily review a patch!

Created attachment 607268
patch

Here's a patch which appears to DTRT on Linux. I don't have a Mac machine and I even less familiar with the OS X event loop than I am with the Linux one, so that bit would have to come from someone else.

Comment on attachment 607268
patch

Actually, this does the right thing, but it would drastically inflate the version of GTK we require; g_unix_signal_add is said in the docs to be available from 2.30 onwards (though this version isn't actually available from the website...?), and we require 2.14. So I guess the better way to do it is to roll our own pipes and such.

Created attachment 608074
patch

Here's a better, somewhat more verbose patch that doesn't require anything beyond what we already use. When receiving SIG{INT,TERM} and restarting, we now display about:home with an option to restore your last session instead of the "this is embarrassing" page.

As with any modular app using signals, I am unsure of how this interacts with code that already tries to deal with signals:

http://mxr.mozilla.org/mozilla-central/source/profile/dirserviceprovider/src/nsProfileLock.cpp#440

nsAppShell appears to install its handlers after this.

There's also some code in breakpad, which I assume doesn't come into play here.

Breakpad doesn't handle SIGTERM, so you should be okay there. You will probably need to make sure that your code interacts properly with the profile locking code, since the purpose of that signal handler is to clean up the profile lock correctly upon exit.

(In reply to Ted Mielczarek [:ted] from comment #19)
> Breakpad doesn't handle SIGTERM, so you should be okay there. You will
> probably need to make sure that your code interacts properly with the
> profile locking code, since the purpose of that signal handler is to clean
> up the profile lock correctly upon exit.

Doing a clean exit (nsAppShell::Exit) from handling the signal won't cause the profile unlocking code to be run? I do still see a .parentlock in my profile.

This is going to be tricky: we can't run the old signal handler before nsAppShell::Exit, because nsProfileLock's signal handlers run _exit or the SIG_DFL handler, which would exit. But we can't run the old signal handler after nsAppShell::Exit because we've exited.

I suppose the right thing to do here then is either:
a) Drop SIGTERM support from nsProfileLock, make your new code call into the nsProfileLock code to unlock the profile before exiting.
or
b) Forget about your new code, just tweak nsProfileLock to special-case SIGTERM and do the handling you're doing.

Two things:

1. You should probably use nsAppStartup::Quit because it will do the Right Things. This includes sending the proper shutdown notifications to code that needs to clean up.
2. Should probably make this generic enough to work on all unixish systems instead of just GTK2. Sounds like ted's second suggestion does that.

(In reply to Michael Wu [:mwu] from comment #22)
> 1. You should probably use nsAppStartup::Quit because it will do the Right
> Things. This includes sending the proper shutdown notifications to code that
> needs to clean up.

Mmm, good point.

> 2. Should probably make this generic enough to work on all unixish systems
> instead of just GTK2. Sounds like ted's second suggestion does that.

It does, but it runs rather a lot of code from inside the signal handler, which seems like a Bad Idea. To make sure things run from outside the signal handler, we need integration with the event loop. I guess we could have nsAppShell::PostSignalEvent(int signum) with a default do-nothing implementation?

Created attachment 609331
patch

New patch using nsIAppStartup; also fewer printfs.

I am not the right person to review this, having never written anything useful with signal handlers. If I understand this correctly, the signal handler itself is signaling a pipe which wakes up the event loop. Is this because it is not safe to make any GTK calls from within the signal handler? I do think that eForceQuit is correct for SIGTERM, but I wonder if we shouldn't be using eAttemptQuit for SIGQUIT, so that it can properly prompt for unsaved work.

(In reply to Benjamin Smedberg [:bsmedberg] from comment #15)
> ok, fix it! I'd happily review a patch!

(In reply to Benjamin Smedberg [:bsmedberg] from comment #25)
> I am not the right person to review this, having never written anything
> useful with signal handlers.

I am confused. :)

> If I understand this correctly, the signal
> handler itself is signaling a pipe which wakes up the event loop. Is this
> because it is not safe to make any GTK calls from within the signal handler?

Doing almost anything non-trivial from a signal handler is a Bad Idea, unless you write your code very very carefully. I do not think GTK code follows the necessary rules for being callable from a signal handler.

> I do think that eForceQuit is correct for SIGTERM, but I wonder if we
> shouldn't be using eAttemptQuit for SIGQUIT, so that it can properly prompt
> for unsaved work.

Wikipedia suggests that SIGQUIT is sent "when the user requests that the process perform a core dump." The glibc info pages agree, but also suggest that "certain kinds of cleanups are best omitted in handling SIGQUIT." (e.g. you might want to examine temporary files along with the core dump you just received.) Given this information, I think eForceQuit is acceptable for both signals...and I doubt people SIGQUIT'ing applications is all that common anyway.

Oh, I confused SIGINT with SIGQUIT. SIGINT should presumably do normal prompting, I guess SIGQUIT doesn't need to as you said.

Somebody who knows signal handlers and GTK well should be the primary code reviewer for this patch (which isn't me). I suggest karlt.

Created attachment 618671
patch

Redirecting f? to karlt as bsmedberg suggests. Updated patch with SIGQUIT handling as well.

I left SIGINT doing a eForceQuit; from comments in this bug, it sounds like people start firefox from the command line and then C-c it. For those sorts of usecases, it would be annoying to click a dialog. (Firefox should just save the data always, without asking, but that's a separate discussion.)

Comment on attachment 618671
patch

Communicating via pipe is the only reasonable async-signal-safe way that I
know to signal the event loop.
eForceQuit on SIGINT sounds sensible to me.
I wouldn't implement this for SIGQUIT for the reasons you list in comment 26.

It should be possible to share the single existing pipe by sending the
appropriate tokens down the same mPipeFDs.

Is there a reason you chose to use SA_RESTART? Not sure it makes much
difference in practice, but I would have expected an interrupt signal to try
to abort ASAP rather than restarting a system call.

These signal handlers should by removed once one is called. Once they've sent
the eForceQuit, sending that again will not achieve much. When they remove
themselves, they should restore the previous signal handler. See how this is
done in nsProfileLock.cpp, and nsProfileLock is really the reason why the
previous handler needs to be restored.

I assume embedders have their own event loop and so this code won't be used.
That is good as this could interfere with their signal handlers.

The app is meant to exit by reraising the signal. The glibc page says "should
end by specifying the default action for the signal that happened and then
reraising it", but I think it is actually more correct use the previous action
to chain up to other signal handlers, which will eventually raise with the
default action. If there is no appropriate place to raise the right signal,
then this may be more trouble than it is worth. If we don't do this, then
that is another good reason not to handle SIGQUIT.

Think also about what happens when a signal is received after the nsAppShell is deleted.

(In reply to Karl Tomlinson (:karlt, offline til July 2) from comment #29)
> Is there a reason you chose to use SA_RESTART? Not sure it makes much
> difference in practice, but I would have expected an interrupt signal to try
> to abort ASAP rather than restarting a system call.

I would expect that too, but since we're not actually quitting immediately, enabling automatic restart of anything still in-progress seems reasonable, giving those calls a chance to complete, etc. I'd be really surprised if all the code in Gecko &co. was prepared to handle interrupted system calls anyway.

> The app is meant to exit by reraising the signal. The glibc page says
> "should
> end by specifying the default action for the signal that happened and then
> reraising it", but I think it is actually more correct use the previous
> action
> to chain up to other signal handlers, which will eventually raise with the
> default action. If there is no appropriate place to raise the right signal,
> then this may be more trouble than it is worth. If we don't do this, then
> that is another good reason not to handle SIGQUIT.

I assume you mean re-raising once we've handled application shutdown? Oof, that's quite some complexity. I appreciate the philosophical point behind trying to exit with the proper codes, but I'm not sure it makes much pragmatic difference here. And for things like nsProfileLock, it seems like a proper invocation of the shutdown process, as triggered by this patch, makes much more sense than trying to invoke nsProfileLock's signal handlers.

(In reply to Nathan Froyd (:froydnj) from comment #31)
> I assume you mean re-raising once we've handled application shutdown?

Yes.

> Oof, that's quite some complexity.

Yes, it could well be too much trouble. It can be attacked separately if/when that turns out to be important for anything.

> And for things like nsProfileLock, it seems like
> a proper invocation of the shutdown process, as triggered by this patch,
> makes much more sense than trying to invoke nsProfileLock's signal handlers.

Yes, a proper shutdown is definitely the appropriate response to the first signal.
If the lock has been removed, then subsequently running nsProfileLock's handlers should be harmless.

Getting this fixed should fix some issues with coverage measurement and external applications using the browser. My current problem is that an external testing application uses SIGTERM to signal the browser to terminate but GCOV data is not written out because no regular exit is performed.

Are there any remaining problems with the attached patch?

I have the same problem, essentially with JS Test Driver trying to close Firefox (I assume using SIGTERM). Sometimes when JS Test Driver tries to restart the browser, the This is embarrassing dialog pops up and JS Test Driver is unable to capture the browser to run unit tests. I see this problem on my Mac and we think we see this problem when we try to run headlessly in a Solaris environment using XVFB. Any word on when this might be fixed? It sounds like there is already a potential patch for it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.