Oneric: On boot up Firefox always displays the “Well, This Is Embarrassing” screen.

Bug #867424 reported by philinux on 2011-10-04
152
This bug affects 37 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Medium
firefox (Ubuntu)
High
Unassigned
Precise
High
Unassigned
thunderbird (Ubuntu)
High
Unassigned

Bug Description

This happens on every restart or boot. Firefox fails to load previous tabs.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: firefox 7.0.1+build1+nobinonly-0ubuntu1
ProcVersionSignature: Ubuntu 3.0.0-12.19-generic 3.0.4
Uname: Linux 3.0.0-12-generic x86_64
NonfreeKernelModules: nvidia
AddonCompatCheckDisabled: False
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices: aplay: device_list:240: no soundcards found...
ApportVersion: 1.23-0ubuntu2
Architecture: amd64
ArecordDevices: arecord: device_list:240: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
BuildID: 20110929022028
CRDA: Error: [Errno 2] No such file or directory
Channel: release
Date: Tue Oct 4 13:15:01 2011
ForcedLayersAccel: False
IfupdownConfig:
 auto lo
 iface lo inet loopback
IncompatibleExtensions: NoScript - ID={73a6fe31-595d-460b-a920-fcc0f8843232}, Version=2.1.4rc2, minVersion=1.9a2, maxVersion=1.9.6, Location=app-profile, Type=extension, Active=Yes
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Beta amd64 (20110910)
IpRoute:
 default via 192.168.1.1 dev eth0 proto static
 169.254.0.0/16 dev eth0 scope link metric 1000
 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.4 metric 1
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
ProcEnviron:
 LANGUAGE=en_GB:en
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
Profiles: Profile0 (Default) - LastVersion=7.0.1/20110929022028 (Running)
RfKill:

RunningIncompatibleAddons: True
SourcePackage: firefox
UpgradeStatus: No upgrade log present (probably fresh install)
UserJS: general.useragent.vendor - Firefox
dmi.bios.date: 01/23/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0304
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M3A78-EH
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0304:bd01/23/2008:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM3A78-EH:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

philinux (philcb) wrote :
gluxon (gluxon) wrote :

Firefox displays the "Well, this is embarrassing" screen when the app hutdown improperly. How are you turning off your computer?

philinux (philcb) wrote :

I think this maybe a glitch in the profile now.

I copied it from Natty. I'll give it a whirl with a new profile.

philinux (philcb) wrote :

Yep that was it. I'll mark as invalid.

Changed in firefox (Ubuntu):
status: New → Invalid
Chris Coulson (chrisccoulson) wrote :

No, this is a real bug. This is because Firefox still depends on libgnome to talk to the session manager, without declaring this in the packaging. libgnome got removed from the CD due to other cleaning this cycle, but there's not really any replacement to libgnome (specifically GnomeClient) for cross-desktop apps like Firefox. The solutions are:

1) Do what other gnome applications do and use the gnome-session dbus API directly, but this breaks integration on all non-GNOME desktops
2) Embed a copy of smclient from http://git.gnome.org/browse/libegg/
3) Use libsm directly rather than relying on GnomeClient or EggSMClient as a higher level wrapper, which I think is what Metacity does

This dropped off my radar because I have other things pulling in libgnome on my system, and nobody else reported it as an issue until now :(

I wonder if we should SRU this for now to pull in libgnome for now, although that's not a good long term solution. Martin, what do you think?

Changed in firefox (Ubuntu):
importance: Undecided → High
status: Invalid → Triaged
philinux (philcb) wrote :

I also realised that this happens on a bad shutdown where REISUB had to be used. On a clean shutdown it does not happen.

Martin Pitt (pitti) wrote :

SRUing this with a new dependency to libgnome sounds OK to me as an unintrusive workaround for oneiric. For precise onwards this should have a better solution, though, like talking to gnome-session through D-Bus perhaps?

Changed in firefox (Ubuntu Oneiric):
importance: Undecided → High
status: New → Triaged
Changed in firefox (Ubuntu):
importance: High → Medium
Changed in thunderbird (Ubuntu Oneiric):
importance: Undecided → High
Changed in thunderbird (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Changed in thunderbird (Ubuntu Oneiric):
status: New → Triaged

Firefox still depends on libgnome and libgnomeui (specifically, GnomeClient - see toolkit/xre/nsNativeAppSupportUnix.cpp) for talking to the session manager. The problem with this is:

1) Those are both deprecated, and have been for a long time
2) They have quite heavy dependencies on a lot of other deprecated libraries (orbit, bonobo, gnomevfs, gconf, glade, gnomecanvas)

We dropped these entirely from the default install in Ubuntu, and this now results in Firefox not shutting down cleanly when users log out of their session (leading to the "Well, this is embarassing" screen when they next start Firefox.

Unfortunately, there isn't really a good replacement for GnomeClient, but the options are:

1) Use the gnome-session DBus interface directly. However, this won't work outside of gnome.
2) Use libsm directly to talk to the session managers xsmp interface, rather than relying on a higher level library. This is more work, but lighter dependencies and would work outside of gnome too.
3) Embed a copy of EggSMClient from http://git.gnome.org/browse/libegg/tree/libegg/ , which has a xsmp backend and is light on dependencies (just libsm AFAICT)

Gnome applications seem to do a combination of 1 and 3, but I think option 1 is a non-starter for us, as the dbus interface is completely gnome-specific and doesn't have any API stability guarantees. The only software I know which does 2 is metacity.

What do other people think?

Changed in firefox:
importance: Unknown → Medium
status: Unknown → Confirmed

Considering first option: I understand that using dbus is suitable only for Gnome but libgnome(ui) is also Gnome-only, right? So far only Gnome platform was supported.

I also like third option but I'm unsure what's Mozilla's attitude to embedding foreign sources.

What is for sure that we need to get rid of libgnome(ui). Sooner is better.

(In reply to jhorak from comment #1)
> Considering first option: I understand that using dbus is suitable only for
> Gnome but libgnome(ui) is also Gnome-only, right? So far only Gnome platform
> was supported.

No: libgnome/libgnomeui are using xsmp, so this (supposedly) works on all desktops.

Indeed, libgnome/libgnomeui does work on all desktops. The fact that it pulls in half of gnome to work is unfortunate, but it does work

I like the third option too. If we can get some agreement that that is the best way to go, then I will start work on this

Hmm, EggSMClient is LGPL only, right?

Not sure but I think that this is not compatible with Mozilla's tri-license when embedded codewise.

Wolfgang: ah, good point. We can ask the code to be relicensed if needed, I guess. There shouldn't be tons of contributors to it (a quick count gives me 13 people, most of them being well-known and easy to reach).

Changed in firefox (Ubuntu Precise):
importance: Medium → High
Changed in thunderbird (Ubuntu Precise):
importance: Medium → High
Changed in firefox (Ubuntu Precise):
assignee: nobody → Chris Coulson (chrisccoulson)

Any news on this subject? I may try to create some patch for this one.

Ah, I've already got one for it here. Will attach what I have in a moment, but I'm just doing something else first

Created attachment 608321
Stop using libgnome and libgnomeui on Linux

This is what I've got so far, although it's not ready for review just yet. I haven't made any corresponding changes in configure.in or any other part of the build system yet.

Any feedback is welcome on the approach.

This makes session manager integration work without embedding eggsmclient, depending on the GNOME-specific dbus interface, adding any other desktop-specific dependencies or depending on Gtk 3.4 (which has session management in GtkApplication)

It should work outside of GNOME too, although I didn't try this yet.

I've not added a runtime dependency on libSM or libICE, as there wasn't one before. There is a build-time dependency on these though, although the headers for these are already installed on the build machines.

I made a couple of other changes to the behaviour too:
- It now only sends the "session-save" notification when the save style is SmSaveLocal or SmSaveBoth. Saving internal state with a save style of SmSaveGlobal is actually incorrect. This means that Firefox now distinguishes between a normal session exit and a session exit with session saving enabled.
- As "quit-application-requested" might pop up a dialog, it only does this if the interact style is not SmInteractStyleNone
- "quit-application-requested" is only sent after sending SmcInteractRequest and receiving an interact message. If we were using eggsmclient, this would actually happen transparently anyway (using the EggSMClient::quit_requested signal)

I have to say, libSM and libICE are probably the most horrible API's I've ever had to work with :)

tags: added: rls-mgr-p-tracking

I'm getting "IO error occured doing Protocol Setup on connection" when calling SmcOpenConnection.
Libs used:
libSM-1.2.1-2.fc18.x86_64
libICE-1.0.8-2.fc18.x86_64
it dies somewhere in _IceRead function.

Adolfo Jayme (fitojb) on 2013-02-26
no longer affects: firefox (Ubuntu Oneiric)
no longer affects: thunderbird (Ubuntu Oneiric)

Ignore last comment, problem was with Fedora's Gnome shell.

Basically it looks like it's working fine. Problem is that Firefox quit, ie. nsIAppStartup::Quit takes some time and session manager is sometimes impatient and kills Firefox sooner that it actually quits which leads to another 'Well, this is embarrassing' page.

There seems to be session saving code missing yet. Session saving should be supported in eggsmc.

I think problem is that nsIAppStartup::Quit have some async calls and returns sooner than application is actually quit. The session manager is expecting that Firefox quit is finished when it returns from nsNativeAppSupportUnix::DieCB method.

We need to wait in nsNativeAppSupportUnix::DieCB until application actually finish quitting. Is there something for this in current code?

(In reply to jhorak from comment #12)
> I think problem is that nsIAppStartup::Quit have some async calls and
> returns sooner than application is actually quit. The session manager is
> expecting that Firefox quit is finished when it returns from
> nsNativeAppSupportUnix::DieCB method.
>
> We need to wait in nsNativeAppSupportUnix::DieCB until application actually
> finish quitting. Is there something for this in current code?

We don't need to wait in DieCB() until the app has done most of the quit, but I expect SmcCloseConnection() should not be called until later in the shutdown sequence.

http://lesstif.sourceforge.net/doc/super-ux/g1ae04e/chap12.html#12.4.1.2

Perhaps there is some later message that can be used to close the SM connection.
Perhaps nsINativeAppSupport::Stop() is appropriate. I'm not clear from the docs that Stop() will always be called.

Perhaps reordering the operations in DieCB() to close the SM connection after nsIAppStartup::Quit may be sufficient.

I wonder whether GnomeClient was explicitly closing the connection, or whether is just got closed with the Display connection.

The SmcCloseConnection called after nsIAppStartup::Quit improved situation a bit, but it's not 100%. The best so far was not to call SmcCloseConnection from DieCB at all. Destructor of nsNativeAppSupportUnix calls DisconnectFromSM and according to my testing it seems to be working just fine.

so what is the status of this bug now? i'm sorta confused based on the last comment. Also, should it be a blocker for wayland support?

There is really no good reason for this code to exist anymore. I ended up with libgnome installed as a dependency for another old program and my Firefox install started misbehaving as a (nearly untracable) sideeffect of that. It took me quite a while to figure out that a Firefox bug could be triggered as a result of installing an unrelated program.

The fact that I didn't have libgnome installed before, and everything was working just fine, is as much of an indication as any that this code needs to die.

(In reply to desrt from comment #16)
> There is really no good reason for this code to exist anymore. I ended up
> with libgnome installed as a dependency for another old program and my
> Firefox install started misbehaving as a (nearly untracable) sideeffect of
> that. It took me quite a while to figure out that a Firefox bug could be
> triggered as a result of installing an unrelated program.
>
> The fact that I didn't have libgnome installed before, and everything was
> working just fine, is as much of an indication as any that this code needs
> to die.
I think this is explained by bug #557601 comment #8 - firefox blocks shutdown for important asynchronous events when you ask it to quit, but not when the session manager asks. I don't think attachment 608321 addresses this any better than the current code does, so a fix is needed for bug #557601 in any case.

I hope someone with more knowledge of asynchronous shutdown can tell whether my analysis is correct and can point to the best way to fix this.

Download full text (4.3 KiB)

Actually, comment #14 is accurate.Calling SmcCloseConnection from ~nsNativeAppSupportUnix should always be safe, as all shutdown phases will have completed by this point (in the current code)

I managed to understand the shutdown sequence a little better by attaching gdb and sprinkling some breakpoints around then doing a normal quit. I couldn't get much direct insight into quit-on-logout as just attaching a debugger seemed to slow things down enough that firefox would always SIGHUP, but from my newly-aqcquired understanding, the code paths *are* the same.

Breakpoint 7, nsObserverService::NotifyObservers (this=0x7f029b4d0330, aSubject=0x0, aTopic=0x7f02b12745a1 "profile-before-change", aSomeData=0x7f02b1592000 <nsXREDirProvider::DoShutdown()::kShutdownPersist> u"shutdown-persist") at /usr/src/debug/firefox-37.0.1/mozilla-release/xpcom/ds/nsObserverService.cpp:319
319 {
[Current thread is 1 (Thread 0x7f02b3d5f780 (LWP 8791))]
#0 nsObserverService::NotifyObservers (this=0x7f029b4d0330, aSubject=0x0, aTopic=0x7f02b12745a1 "profile-before-change", aSomeData=0x7f02b1592000 <nsXREDirProvider::DoShutdown()::kShutdownPersist> u"shutdown-persist") at /usr/src/debug/firefox-37.0.1/mozilla-release/xpcom/ds/nsObserverService.cpp:319
#1 0x00007f02b0ab1fe0 in nsXREDirProvider::DoShutdown (this=0x7fffeed02f88) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsXREDirProvider.cpp:905
#2 0x00007f02b0aaa102 in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0x7f02b29ac7c0, __in_chrg=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:1324
#3 0x00007f02b0aafc63 in XREMain::XRE_main (this=0x7fffeed02f48, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:4312
#4 0x00007f02b0aafeac in XRE_main (argc=1, argv=0x7fffeed04468, aAppData=0x7fffeed03158, aFlags=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsAppRunner.cpp:4505
#5 0x00000000004040fd in do_main (argc=argc@entry=1, argv=argv@entry=0x7fffeed04468, xreDirectory=0x7f02b2949780) at /usr/src/debug/firefox-37.0.1/mozilla-release/browser/app/nsBrowserApp.cpp:292
#6 0x000000000040389f in main (argc=1, argv=0x7fffeed04468) at /usr/src/debug/firefox-37.0.1/mozilla-release/browser/app/nsBrowserApp.cpp:661

i.e. profile-before-change (and web-workers-shutdown which happens just after) happens when toolkit/xre/nsAppRunner.cpp:XREMain::XRE_main destroys the ScopedXPCOMStartup object, just after exiting the event loop.

Breakpoint 3, nsNativeAppSupportUnix::~nsNativeAppSupportUnix (this=0x7f02b29e5460, __in_chrg=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsNativeAppSupportUnix.cpp:126
126 class nsNativeAppSupportUnix : public nsNativeAppSupportBase
[Current thread is 1 (Thread 0x7f02b3d5f780 (LWP 8791))]
#0 nsNativeAppSupportUnix::~nsNativeAppSupportUnix (this=0x7f02b29e5460, __in_chrg=<optimized out>) at /usr/src/debug/firefox-37.0.1/mozilla-release/toolkit/xre/nsNativeAppSupportUnix.cpp:126
#1 0x00007f02b0ab0e3e in nsNativeAppSupportBase::Release (this=0x7f02b29e5460) at /usr/src/debug/firefox-37.0.1/mozi...

Read more...

As a minor aside, it would be desirable to do as much work as possible between SaveYourselfCB() and SmcSaveYourselfDone(), and as little as possible between DieCB() and SmcCloseConnection().

At least in KDE, the time-out for the first phase is slightly longer (15 seconds and configurable versus 10 seconds and not configurable IIRC)

But probably the convolutions required to accomplish this aren't worth it.

I'm guessing libsm/libice won't work on wayland?

Looks like we might be able to get shutdown (but not logout) support from systemd. -https://wiki.freedesktop.org/www/Software/systemd/inhibit/

(In reply to Bryan Quigley from comment #20)
> I'm guessing libsm/libice won't work on wayland?
>
> Looks like we might be able to get shutdown (but not logout) support from
> systemd. -https://wiki.freedesktop.org/www/Software/systemd/inhibit/

Well, the ICE protocol was deliberately independent of the X11 protocol so I guess classical session managers could still work with wayland. In practice, I don't know.

A systemd-style logout inhibit would let us delay logout until firefox has saved everything it needs to. But it doesn't provide a way to save restore the position and desktop of each window, as libsm/xsmp does. Nor does it restore applications on the next login or let word-processor style apps restore with the currently-opened file.

Created attachment 8605868
Stop using libgnome and libgnomeui on Linux v2

I've updated the patch above, and have had some success with it, but it still needs some work in order to disconnect from the SM at the right time. It's been compiled and tested a little on top of 37.0.1, it just needed a couple of trivial conflicts resolved to rebase it to master, but hasn't been tested there.

Changes from v1
---------------

* Rebase to 37.0.1 and resolve conflicts
  - Drop MOZ_WIDGET_GTK == 2 checks
  - Unbitrot to match "Bug 777292 part 2 - Change all nsnull to
nullptr" (Note that this was actually Bug 626472)
  - Unbitrot to match "Bug 773151: Convert nsCAutoString->nsAutoCString
CLOSED TREE r=bsmedberg"
  - Unbitrot to match "Bug 784739 - Switch from NULL to nullptr in
toolkit/;r=ehsan"
  - Avoid "Wwrite-strings" warning for static string conversion
* Further changes to the functionality of the patch
  - Don't DisconnectFromSM() until destructor.
  - Add comments explaining the rationale for disconnect timing wrt
shutdown phases
  - Expand some comments with more detail on the standard or
implementation issues
  - Don't IceSetErrorHandler, and add comment about the motivation for
IceSetIOErrorHandler
  - Rename client_id to prev_client_id; Rename new_client_id to
client_id
  - Set SM_CLIENT_ID on client leader window as per ICCM specified

Questions
---------

* Is the #define gtk_function gtk_function_, #undef gtk_function dance
in the gtk compat defines the right approach?
* are gio watch callbacks always guaranteed to be on the main thread?
* what should I use to log instead of printf?

Remaining issues
----------------
* Doesn't solve bug #557601 - DisconnectFromSM is called at the right
time when quitting firefox, but is too early on shutdown. I don't
understand why yet.
* The interaction call/response with the session manager needs some work
* Doesn't set a couple of SM properties that gnome-client did
* Doesn't use grabs to prevent interaction at the wrong time like
gnome-client did

To expand on it not yet solving bug #557601 - on normal application quit I see New state = DISCONNECTED
 after phase "web-workers-shutdown"; on SM-triggered shutdown/quit I've seen it DISCONNECTED just after phase "profile-change-teardown"..

Created attachment 8737263
Stop using libgnome and libgnomeui on Linux v3

Updated this patch again, and tested on master and on top of 45.0.1. It works
with both gtk2 and gtk3 builds but I'd still like feedback about whether I'm
doing the right thing with all the gtk plumbing.

Solved Issues
-------------

"Doesn't solve bug #557601 - DisconnectFromSM is called at the right time when
quitting firefox, but is too early on shutdown. I don't understand why yet."

This was just PEBKAC. Turns out that if you run your test firefox build from a
terminal it will reliably be killed when the terminal quits - who knew? Running
with nohup or from e.g. krunner works and firefox survives until it's finished
cleaning up. Firefox started by session restore also behaves as expected.

Changes from v2
---------------

* Rebase to master
  - Unbitrot to match "Bug 1207245 - part 6 - rename nsRefPtr<T> to RefPtr<T>"
  - Unbitrot: add missing nsThreadUtils.h include
* Add keyword for SetClientState logging
* Add symbol to mozgtk so that gtk3 builds link
* Written a new commit message

Questions
---------

* Is the #define gtk_function gtk_function_, #undef gtk_function dance in the
  gtk compat defines the right approach?
* Is the mozgtk stub symbol definition sufficient for gtk3 builds?
* are gio watch callbacks always guaranteed to be on the main thread?
* what should I use to log instead of printf? What log facilities are still
 available in late shutdown?

Remaining issues
----------------

* The interaction call/response with the session manager may need some work
  - in fact it needs to be tested at all.
* Doesn't set a couple of SM properties that gnome-client did
* Doesn't use grabs to prevent interaction at the wrong time like gnome-client
  did. But the libegg SM implementation doesn't do anything to prevent
  interaction, as far as I can tell. So I'm not sure what the correct behaviour
  is.

(In reply to Oliver Henshaw from comment #18)
> I managed to understand the shutdown sequence a little better by attaching
> gdb and sprinkling some breakpoints around then doing a normal quit.

https://wiki.mozilla.org/XPCOM_Shutdown may be useful, but it sounds like
you've worked things out anyway.

(In reply to Oliver Henshaw from comment #24)
> Questions
> ---------
>
> * Is the #define gtk_function gtk_function_, #undef gtk_function dance in the
> gtk compat defines the right approach?

It's probably possible to use macros to save some of the repetition, if you
like. See, for example,
https://dxr.mozilla.org/mozilla-central/rev/68c0b7d6f16ce5bb023e08050102b5f2fe4aacd8/toolkit/system/gnome/nsPackageKitService.cpp#48
https://dxr.mozilla.org/mozilla-central/rev/68c0b7d6f16ce5bb023e08050102b5f2fe4aacd8/dom/gamepad/linux/udev.h#74

But what's there already looks fine, except that I don't know why the #undefs
are required.

> * Is the mozgtk stub symbol definition sufficient for gtk3 builds?

Yes, good.

> * are gio watch callbacks always guaranteed to be on the main thread?

Yes, if set up on the main thread.

> * what should I use to log instead of printf? What log facilities are still
> available in late shutdown?

MOZ_LOG and LazyLogModule from
https://dxr.mozilla.org/mozilla-central/source/xpcom/base/Logging.h
should work fine, I think.
That's the preferred system if it works, and I don't see it being shut down
at any stage.

> Remaining issues
> ----------------
>
> * The interaction call/response with the session manager may need some work
> - in fact it needs to be tested at all.

I haven't looked at that yet. Will make some time to look over this patch,
thanks.

> * Doesn't set a couple of SM properties that gnome-client did

Which ones are those?

> * Doesn't use grabs to prevent interaction at the wrong time like
> gnome-client did. But the libegg SM implementation doesn't do anything to
> prevent interaction, as far as I can tell. So I'm not sure what the
> correct behaviour is.

I don't know of a good reason why we need to care about that.

This fails to compile in latest trunk with undefined reference to NS_NewRunnableMethod.
Clang thinks I meant to type NewRunnableMethod.
Was the function renamed?

Created attachment 8754795
(1/2) - Change modelines to those recommended by coding style

Created attachment 8754796
(2/2) - Stop using libgnome and libgnomeui on Linux v4

Changes from v3
---------------

* Rebase to master
  - Unbitrot to match "Bug 1268313: Part 7 - Move NS_NewRunnableMethod and
    friends to mozilla::NewRunnableMethod"
* Set client state to disconnected before closing connection, so that the log
  message appears before the actual disconnection.
* Use mozilla::Logging in SetClientState
  - The first possible MOZ_LOG user is very early in startup so I need to
    add a LogModule::Init() in nsNativeAppSupportUnix::Start as nothing else
    has initialised the logging yet.
* Rework setting of SM Properties
  - Add helper functions setSMValue, setSMProperty.
  - Set all the SM Properties required by the spec
  - Try to set a reasonable-ish fallback value if can't determine a correct value
    (NB: XREMain::XRE_mainInit will exit early if AppData->name is not set)
  - Warn if setting a fallback value
  - Use appropriate name for each SMProp and SMPropValue[]

Solved Issues
-------------

Tested interaction during shutdown (i.e. "quit-application-requested") by
having multiple tabs open and set browser.showQuitWarning to true, and
patching with:

diff --git a/browser/components/nsBrowserGlue.js b/browser/components/nsBrowserGlue.js
index eec2e82..ee0d186 100644
--- a/browser/components/nsBrowserGlue.js
+++ b/browser/components/nsBrowserGlue.js
@@ -312,7 +312,7 @@ BrowserGlue.prototype = {
       case "session-save":
         this._setPrefToSaveSession(true);
         subject.QueryInterface(Ci.nsISupportsPRBool);
- subject.data = true;
+ subject.data = false;
         break;
       case "places-init-complete":
         if (!this._migrationImportsDefaultBookmarks)
@@ -1377,8 +1377,8 @@ BrowserGlue.prototype = {
     // browser.showQuitWarning specifically covers quitting
     // browser.tabs.warnOnClose is the global "warn when closing multiple tabs" pref

- var sessionWillBeRestored = Services.prefs.getIntPref("browser.startup.page") == 3 ||
- Services.prefs.getBoolPref("browser.sessionstore.resume_session_once");
+ var sessionWillBeRestored = false; //Services.prefs.getIntPref("browser.startup.page") == 3 ||
+ // Services.prefs.getBoolPref("browser.sessionstore.resume_session_once");
     if (sessionWillBeRestored || !Services.prefs.getBoolPref("browser.warnOnQuit"))
       return;

Comment on attachment 8754795
(1/2) - Change modelines to those recommended by coding style

Thanks!

Comment on attachment 8754796
(2/2) - Stop using libgnome and libgnomeui on Linux v4

Andrew might be able to look over this before I get a chance. Still there's quite a bit going on here, so it may take some time. Thanks, Andrew!

As indicated in bug 557601, this is a regression with the port to GTK3, for those that still had the libgnome libraries installed.

Comment on attachment 8754796
(2/2) - Stop using libgnome and libgnomeui on Linux v4

Review of attachment 8754796:
-----------------------------------------------------------------

Looks good, thanks! There are a few style issues, but the file's style is highly inconsistent anyway. While I'm not that familiar with X11 session management, things seem quite sane according to https://www.x.org/releases/X11R7.6/doc/libSM/SMlib.html.

::: toolkit/xre/nsNativeAppSupportUnix.cpp
@@ +429,5 @@
> +
> + --gArgc;
> +}
> +
> +static void setSMValue(SmPropValue& val, const nsCString& data)

Please capitalize function names and add a newline after the return type declaration in implementations.

Also, this block of code isn't guarded by MOZ_X11. Systems without X11 won't be able to resolve SmPropValue.

@@ +436,5 @@
> + val.length = data.Length();
> +}
> +
> +static void setSMProperty(SmProp& prop, const char* name, const char* type,
> + int numVals, SmPropValue vals[])

Style and guard as described above.

@@ +491,5 @@
> + char *arg = *curarg;
> + if (arg[0] == '-' && arg[1] == '-') {
> + arg += 2;
> + if (!strcmp(arg, "sm-disable")) {
> + RemoveArg(curarg);

Is there any reason we can't just increment argv / decrement argc here instead of shifting the char* pointers in RemoveArg?

@@ +576,5 @@
> + char errbuf[256];
> + mSessionConnection = SmcOpenConnection(nullptr, this, SmProtoMajor,
> + SmProtoMinor, mask, &callbacks,
> + prev_client_id.get(), &client_id,
> + 256, errbuf);

Replace 256 with sizeof(errbuf).

Created attachment 8757425
(2/2) - Stop using libgnome and libgnomeui on Linux v5.

Changes from v4
---------------

Made myself the author, since I deserve the blame even though I don't
deserve the credit. Added a note that Chris Coulson wrote the original
patch. If there's a better way to handle this, perhaps someone with
mercurial-fu can do the magic?

Addressed review comments:

> ::: toolkit/xre/nsNativeAppSupportUnix.cpp
> @@ +429,5 @@
> > +
> > + --gArgc;
> > +}
> > +
> > +static void setSMValue(SmPropValue& val, const nsCString& data)
>
> Please capitalize function names and add a newline after the return type
> declaration in implementations.
>
> Also, this block of code isn't guarded by MOZ_X11. Systems without X11 won't
> be able to resolve SmPropValue.

Done.

>
> @@ +436,5 @@
> > + val.length = data.Length();
> > +}
> > +
> > +static void setSMProperty(SmProp& prop, const char* name, const char* type,
> > + int numVals, SmPropValue vals[])
>
> Style and guard as described above.
and done.

> @@ +491,5 @@
> > + char *arg = *curarg;
> > + if (arg[0] == '-' && arg[1] == '-') {
> > + arg += 2;
> > + if (!strcmp(arg, "sm-disable")) {
> > + RemoveArg(curarg);
>
> Is there any reason we can't just increment argv / decrement argc here
> instead of shifting the char* pointers in RemoveArg?
The argument to remove might be at an arbitrary position in argv, so I
don't think there's a better implementation. Actually, it's copied from
toolkit/xre/nsAppRunner.cpp so might it be worth a followup
de-duplicating it?

>
> @@ +576,5 @@
> > + char errbuf[256];
> > + mSessionConnection = SmcOpenConnection(nullptr, this, SmProtoMajor,
> > + SmProtoMinor, mask, &callbacks,
> > + prev_client_id.get(), &client_id,
> > + 256, errbuf);
>
> Replace 256 with sizeof(errbuf).

Done

Note
----

An easier way to test interaction during shutdown (i.e.
"quit-application-requested") by having multiple tabs open and set
browser.showQuitWarning to true, then configuring the session manager to
not save state (so it only asks for SmSaveGlobal)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=24c95faf3d3e
is a compile-only try run. Not sure what tests, if any, are suitable.

toolkit/xre/nsAppRunner.cpp:SaveToEnv() intentionally leaks memory because it ends up calling putenv(3) which 'leaks' by design and by specification - this was suppressed for valgrind in bug #793534. But I don't understand why this doesn't trigger the leak sanitizer in the existing code - also there's nothing in these patches that touches nsAppRunner.cpp.

I can't reproduce this locally but I made a couple of wild guesses as to what change could trigger the lsan warning (https://hg.mozilla.org/try/rev/9770e1ff4923 and https://hg.mozilla.org/try/rev/69de1e6095ae ) but the try job still fails - https://treeherder.mozilla.org/#/jobs?repo=try&revision=59094121359b&selectedJob=22442721

If there was an nspr function that used setenv(3) instead of putenv(3) then the strings would be copied inside libc and the existing lsan suppression for libc would cover this.

needinfo'ing mccr8 in case he has any insight into this.

I suggest continuing to reduce the patch to find the minimal change that triggers the leak.

Still, it might be helpful to know how PR_SetEnv() leaks are usually suppressed for the leak sanitizer.

I have no idea why this wouldn't show up before. I'm not an expert on PR_SetEnv, but all of the callers of SaveToEnv are passing in string constants, so it doesn't look to me that you actually need to do a strdup. The header for SetEnv just says that it has to be a "constant, persistent string", and passing in a string literal should be fine. A few call sites do that already. So maybe just inline SaveToEnv and remove the strdup.

nsAppRunner is compiled into libxul.so. If it leaves static strings in the environment then that can cause shutdown crashes after libxul.so is unloaded - bug 555894 and bug 473629.

My best guess here is that something is or was changing the environment, in a way that changed whether lsan thought the strings were still in use. The fix may be just to add suppressions for these leaks, but it would be helpful to know why lsan thinks these are no longer in the environment.

LSan treats things reachable from global variables as not being leaks, which is different than Valgrind.

A fairly reduced change is https://hg.mozilla.org/try/rev/def1122a86ba4a17de9da4ed6fa04323c819b753 (with the no-op https://hg.mozilla.org/try/rev/487f69ba0d14 as parent) - see https://treeherder.mozilla.org/#/jobs?repo=try&revision=959751bdb468. I think setenv and putenv share some code in glibc - so perhaps there's some interaction there, i.e. setenv calls prevent putenv leaks but removing the calls reveals them?

If the putenv leaks are something that is fixed in later glibc versions (than the try builder uses) that may explain why I can't reproduce this locally.

(In reply to Oliver Henshaw from comment #43)
> A fairly reduced change is
> https://hg.mozilla.org/try/rev/def1122a86ba4a17de9da4ed6fa04323c819b753

Interesting, thanks. So either setenv or unsetenv changes something about the other environment variables.

I think these leak reports should just be suppressed.
I guess the way to do that is to add to
http://searchfox.org/mozilla-central/source/build/sanitizers/lsan_suppressions.txt

Created attachment 8763601
(1/4) - Change modelines to those recommended by coding style.

Created attachment 8763602
(2/4) - Annotate deliberate leak in SaveToEnv

glandium suggested using MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT directly in
the code.

Created attachment 8763603
(3/4) - Stop using libgnome and libgnomeui on Linux v6.

Minor change that I spotted while trying to track down the setenv
problem - drop unneeded xlib.h include.

Created attachment 8763607
(4/4) - Drop unused Xatom.h include

Another un-needed include, this one was left over after the removal of
the N900 code.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=76ba509b5927 shows the lsan annotation works.

Is there any need to re-run the normal try tests from comment #34?

Comment on attachment 8763602
(2/4) - Annotate deliberate leak in SaveToEnv

(In reply to Oliver Henshaw from comment #46)
> glandium suggested using MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT directly in
> the code.

Yes, that's much better than my suggestion, thanks.

LGTM, but to comply with documentation, I'll leave review to Andrew.

(In reply to Oliver Henshaw from comment #49)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=76ba509b5927 shows
> the lsan annotation works.
>
> Is there any need to re-run the normal try tests from comment #34?

No, those ASAN tests should be enough, thanks.

Comment on attachment 8763602
(2/4) - Annotate deliberate leak in SaveToEnv

Review of attachment 8763602:
-----------------------------------------------------------------

Be sure to #include "mozilla/MemoryChecking.h" in this file please.

(In reply to Andrew McCreight (PTO-ish) [:mccr8] from comment #52)
> Comment on attachment 8763602
> (2/4) - Annotate deliberate leak in SaveToEnv
>
> Review of attachment 8763602:
> -----------------------------------------------------------------
>
> Be sure to #include "mozilla/MemoryChecking.h" in this file please.

It's already there. Which is good, because I can pretend I checked.

Pushed by <email address hidden>:
https://hg.mozilla.org/integration/fx-team/rev/10617f9fdc57
(1/4) - Change modelines to those recommended by coding style. r=karlt
https://hg.mozilla.org/integration/fx-team/rev/8996273af30f
(2/4) - Annotate deliberate leak in SaveToEnv. r=karlt
https://hg.mozilla.org/integration/fx-team/rev/c37c930d089f
(3/4) - Stop using libgnome and libgnomeui on Linux. r=acomminos
https://hg.mozilla.org/integration/fx-team/rev/ecd3562339dc
(4/4) - Drop unused Xatom.h include. r=karlt

Changed in firefox:
status: Confirmed → Fix Released

Thank you for your thoroughness and persistence, Oliver!

Fabrizio Gennari (fabrizio-ge) wrote :

This is marked as fixed upstream. So why does is still occur on Artful and Firefox Quantum 59.0.2?

Fabrizio Gennari (fabrizio-ge) wrote :

It also affects bionic and Firefox Quantum 60.0

Fabrizio Gennari (fabrizio-ge) wrote :

In the most recent version (example: Firefox 63.0.3 on Cosmic) the behaviour is different. Firefox actually blocks the restart/shutdown process by showing a popup "X tabs will be closed". But the whole OS also creates a modal dialog box, asking whether to cancel, reboot or turn off, and the modal dialog boxes actually prevents the user from closing Firefox properly using the Firefox pop-up window. If "reboot" or "turn off" are chosen, Firefox will show the dreaded "Well, This Is Embarrassing" screen.

This is an improvement from the previous versions. However, a proper shutdown sequence should allow the user to close applications properly right after initiating the reboot/shutdown procedure.

Sebastien Bacher (seb128) wrote :

Upstream closed that ticket and there has been no report in years, is anyone still seeing the issue on recent Ubuntu versions?

Changed in firefox (Ubuntu):
assignee: Chris Coulson (chrisccoulson) → nobody
Changed in firefox (Ubuntu Precise):
assignee: Chris Coulson (chrisccoulson) → nobody
Changed in firefox (Ubuntu):
status: Triaged → Incomplete
Changed in thunderbird (Ubuntu):
status: Triaged → Incomplete
Changed in firefox (Ubuntu Precise):
status: Triaged → Won't Fix
no longer affects: thunderbird (Ubuntu Precise)
Changed in firefox (Ubuntu Precise):
status: Won't Fix → Incomplete
Paul White (paulw2u) wrote :

Further to comment #68 there has been no reply after 4 weeks

Bug report won't expire due to bug watch
Upstream issue was closed "RESOLVED FIXED" on 2016-06-22
Reporter doesn't seem to use Launchpad any more

Anyone still seeing a "Well, This Is Embarrassing" message should
create a new report of their own so that their own debug information
is attached to the report and any issue can be investigated while
using an up-to-date Ubuntu/Firefox/Thunderbird release.

FWIW, not an issue here when using 18.04, 20.04 and the interim releases.

Closing as "Fix Released" to match status of upstream report and to
reflect the lack of confirmation that the issue still exists.

Changed in firefox (Ubuntu):
status: Incomplete → Fix Released
Changed in thunderbird (Ubuntu):
status: Incomplete → Fix Released
Changed in firefox (Ubuntu Precise):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.