upstart job keeps restarting a dying gdm

Bug #441638 reported by hatchetman82 on 2009-10-03
124
This bug affects 17 people
Affects Status Importance Assigned to Milestone
gdm (Ubuntu)
High
Steve Langasek
Karmic
High
Steve Langasek

Bug Description

after completing the karmic beta install wizard, there appears a dialog box instructing the user to reboot to boot into the new installation.
after confirming a reboot, gdm started dying and respawning in an endless loop (or at least for a few minutes before i shut down the machine).
this does not happen when booting into the installed OS and shutting it down/rebooting, only when rebooting after installing from the CD..

this causes the screen to flicker horribly, and the machine is just stuck like that until shut down.

this is how it looks like :
http://www.youtube.com/watch?v=VpWpY-0A8Jo

the text flashing on the screen is an endless loop of:

init: gdm main procedd ([pid]) terminated with status 1
init: gdm main process ended, respawning
init: ubiquity main process ([pid]) terminated with status 1

Paul Tagliamonte (paultag) wrote :

Also affects me in a virtual box.

Changed in ubuntu:
status: New → Confirmed
Anthony_Lanese (master-bratac) wrote :

Also affects me in VirtualBox

Affects me in VirtualBox as well.

Stewart Johnston (stooj) wrote :

This is happening to me as well.

I am installing Ubuntu on the bare metal - not a VM

Has happened with my macbook and my desktop.

Luke Faraone (lfaraone) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please run the command 'apport-collect BUGNUMBER' which will attach necessary information for debugging this. Thanks in advance.

affects: ubuntu → gdm (Ubuntu)
Changed in gdm (Ubuntu):
status: Confirmed → Incomplete
Maxim Levitsky (maximlevitsky) wrote :

The fact that gdm is dying for some reason is one thing. probably missing X drivers (nvidia) or something like that.
But the fact that upstart doesn't have limit on number of times it re spawns a process is really bad, becase it makes it impossible to debug unless you ssh in the system

DaveMachine (davide.bertolotto) wrote :

Affects me in VirtualBox as well.

Architecture: i386
DistroRelease: Ubuntu 9.10
Package: gdm 2.28.0-0ubuntu8
PackageArchitecture: i386
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-11.36-generic
Uname: Linux 2.6.31-11-generic i686
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare

Changed in gdm (Ubuntu):
status: Incomplete → New
tags: added: apport-collected

The appot-collect above was run from a DELL Studio XPS on which I experienced the bug exactly as in the Bug description.

Architecture: i386
DistroRelease: Ubuntu 9.10
Package: gdm 2.28.1-0ubuntu1
PackageArchitecture: i386
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
Uname: Linux 2.6.31-14-generic i686
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
XsessionErrors:
 (gnome-settings-daemon:1336): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (gnome-settings-daemon:1336): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (nautilus:1401): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
 (polkit-gnome-authentication-agent-1:1418): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed

Luke Faraone (lfaraone) on 2009-10-24
Changed in gdm (Ubuntu):
status: New → Confirmed

Please disregard my previous post. I inadvertently sent the apport-collect data from the installed VirtualBox image instead of the live CD. Sorry.

Bryce Harrington (bryce) on 2009-10-27
Changed in gdm (Ubuntu):
importance: Undecided → High
milestone: none → karmic-updates
status: Confirmed → Triaged
Bryce Harrington (bryce) wrote :

It appears with the new gdm, it no longer triggers FailsafeServer after N failed tries. Instead, gdm simply exits after 5 tries. Unfortunately, at that point upstart notices gdm exited and restarts it.

This is fairly easy to reproduce by doing 'echo "foo" >> /etc/X11/xorg.conf' and rebooting.

Changed in gdm (Ubuntu Karmic):
assignee: nobody → Sebastien Bacher (seb128)
Bryce Harrington (bryce) wrote :

I'd concur with comment #7. That X fails to start is unfortunate, but that's a situation that can happen for a variety of reasons (a simple typo in xorg.conf can prevent it starting for instance); but typically gdm would dump you into some sort of failsafe mode so you could debug the problem. Now gdm simply exits after a few tries, and restarts; the user experience is a lot of flickering and no way to break in to do debugging.

Ideally, gdm should fire up the failsafe stuff after failing. If that can't be done, then at least it should dump out to the console login, rather than getting stuck in a loop.

I think this latter option could be achieved by simply making the gdm upstart job not restart gdm on failure, or to try it a few times but give up if gdm keeps exiting in a short period of time.

Martin Pitt (pitti) wrote :

Scott, is there a particular reason why you made the gdm upstart job respawning? It already has its own logic to respawn on a crashing X server (or if an X session terminates normally). If gdm itself is crashing, then restarting it over and over does not make much sense either.

Any objections in making the upstart job not respawn itself? Does this require anything else than just removing "respawn"?

Thank you!

summary: - gdm main process keeps dying and respawning on reboot after karmic beta
- install
+ upstart job keeps restarting a dying gdm
Changed in gdm (Ubuntu Karmic):
assignee: Sebastien Bacher (seb128) → Martin Pitt (pitti)
dns_server (dns-server) wrote :

gdm is not starting x we have xsplash now.
does xsplash have the same logic that gdm did and dump back to a failsafe.
it seems that xsplash is crashing and upstart is respawning it as it crashes.

On Tue, Oct 27, 2009 at 10:12:24PM -0000, dns_server wrote:
> gdm is not starting x we have xsplash now.

Incorrect. gdm is still the service that starts X, xsplash is launched from
there.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

On Tue, Oct 27, 2009 at 06:55:43PM -0000, Bryce Harrington wrote:
> Ideally, gdm should fire up the failsafe stuff after failing. If that
> can't be done, then at least it should dump out to the console login,
> rather than getting stuck in a loop.

If it's possible to have gdm exit with a particular exit code when it's
failed to start X, then see /etc/init/mountall-reboot.conf for an example of
how to use this to only start the failsafe job on this exit code.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Steve Langasek (vorlon) wrote :

And yes, dropping 'respawn' from the job is enough to stop the job from being respawned by upstart.

I think the only justification for setting respawn is that, even though it's not the upstart default, it's a sensible default policy for services. So if it's not working for any reason, by all means it should be changed.

Martin Pitt (pitti) on 2009-10-28
Changed in gdm (Ubuntu Karmic):
status: Triaged → In Progress

Architecture: i386
DistroRelease: Ubuntu 9.10
Package: gdm 2.28.1-0ubuntu1
PackageArchitecture: i386
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, no user)
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic-pae
Uname: Linux 2.6.31-14-generic-pae i686
UserGroups: vboxusers

Ok guys, here's the deal:

Not only GDM is not starting but also TTY1-TTY7 are being terminated, and that before GDM is attempted to be started.
Of course GDM cannot start if there is no TTY available.
Further this problem occurs only with kernel 2.6.31-14
kernel 2.6.28-15 boots perfectly!

Btw. this problem does not occur on our 8 Fujitsu Siemens Workstations but only on HP Workstations...

Have phun debugging this!*g*

Your's firebug

Ladies and gentlemen here comes the solution:

I realized by connecting to the machine via ssh that actually everything is running, ie. GDM, X, Pulse, whatever you just don't get to see anything which is due to a completely out of range resolution of horizontal 268kHz, vertical 281Hz.
I have now completely emptied /etc/X11/xorg.conf and voila, it works!

just as if nothing happened, all TTYs are back online, graphics are just fine, no "init: gdm main process ended, respawning"
error messages anymore!

Your's, firebug

Martin Pitt (pitti) on 2009-11-03
Changed in gdm (Ubuntu):
milestone: karmic-updates → none
Bryce Harrington (bryce) wrote :

> I realized by connecting to the machine via ssh that actually everything is running, ie. GDM, X, Pulse, whatever you just don't get to see anything which is due to a completely out of range resolution of horizontal 268kHz, vertical 281Hz.
> I have now completely emptied /etc/X11/xorg.conf and voila, it works!

Please note that people experiencing this bug have two bugs, first something which causes X to fail, and second an issue with gdm where it keeps trying to restart X and causes flickering. This bug report focuses only on the second issue, the gdm blinking. For the other bug, please file a separate bug report (hint: use 'ubuntu-bug xorg' to do it).

firebird, in your case I would be interested in seeing the xorg.conf before and after your edits in addition to the stuff ubuntu-bug collects, so we can see what exactly needed changed to make it work. However, what you describe sounds similar to many other bug reports we already have so perhaps is just a dupe of a known issue. Hard to say without seeing a complete report.

Steve Langasek (vorlon) on 2009-11-04
Changed in gdm (Ubuntu Karmic):
assignee: Martin Pitt (pitti) → Steve Langasek (vorlon)
Changed in gdm (Ubuntu):
assignee: Martin Pitt (pitti) → Steve Langasek (vorlon)
dns_server (dns-server) wrote :

In my case the nvidia kernel module was not built correctly built (it was resolved a day later) causing this problem.

It is simple to reproduce just create a bad xorg.conf, this can be done by making a spelling mistake, having the wrong driver ie the nvidia driver marked in the xorg.conf when the driver is not installed etc.

Steve Langasek (vorlon) wrote :

Yes, we know how to reproduce this bug, that's why it's marked as "in progress".

Colin Watson (cjwatson) wrote :

Accepted into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in gdm (Ubuntu Karmic):
status: In Progress → Fix Committed
tags: added: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm - 2.28.1-0ubuntu2

---------------
gdm (2.28.1-0ubuntu2) karmic-proposed; urgency=low

  * Don't respawn gdm on failure; this lets us capture X failures instead and
    trigger the bulletproof X handler here. LP: #441638.
  * re-export any XORGCONFIG value passed to the upstart job, needed to
    complete integration with bulletproof X. LP: #474806.

 -- Steve Langasek <email address hidden> Wed, 04 Nov 2009 18:15:37 -0800

Changed in gdm (Ubuntu Karmic):
status: Fix Committed → Fix Released
Steve Langasek (vorlon) wrote :

copied to lucid.

Changed in gdm (Ubuntu Karmic):
status: Fix Released → Fix Committed
Changed in gdm (Ubuntu):
status: In Progress → Fix Released

Hi guys!

I guess what Bryce Harrington required from me is then obsolete?

Yippie!!!

Alan Johnson (nilgiri) wrote :

Yep, the blinking is fixed for me on a fresh install of Karmic server (I wanted RAID6 support at install) with ubuntu-desktop installed. The gdm package gets my OK for moving to updates. 'cause, you know, I know you were waiting for /my/ OK, right? =)

Martin Pitt (pitti) on 2009-11-08
tags: added: verification-done
removed: verification-needed
Bryce Harrington (bryce) wrote :

A latter bug was found in getting the failsafe session up, since dexconf no longer generates xorg.conf's, it also can't be used to generate the failsafe mode's xorg.conf. I've uploaded a change which just generates a standard xorg.conf (this is pretty much all dexconf was doing last release anyway). So recommend the fix for bug 477149 go in along with this one.

Sebastien Bacher (seb128) wrote :

bug #476874 is an user stating the upgrade broke things

Steve Langasek (vorlon) wrote :

That bug appears to be a problem with failsafe X not working correctly on the user's system, but there's no information there about why gdm is failing in the first place. I think this needs to be triaged further before concluding that we shouldn't proceed with this SRU (OTOH, we also shouldn't publish this SRU until it's resolved).

firebird wrote:

>I have now completely emptied /etc/X11/xorg.conf and voila, it works!

This fixed it for me.

tags: added: iso-testing
Bryce Harrington (bryce) wrote :

It would be great to see this pushed out, this issue is making it difficult for users to troubleshoot underlying X issues...

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm - 2.28.1-0ubuntu2

---------------
gdm (2.28.1-0ubuntu2) karmic-proposed; urgency=low

  * Don't respawn gdm on failure; this lets us capture X failures instead and
    trigger the bulletproof X handler here. LP: #441638.
  * re-export any XORGCONFIG value passed to the upstart job, needed to
    complete integration with bulletproof X. LP: #474806.

 -- Steve Langasek <email address hidden> Wed, 04 Nov 2009 18:15:37 -0800

Changed in gdm (Ubuntu Karmic):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

I'm not sure what to do with the xorg proposed update now. Bug 475259 wasn't introduced by the xorg/gdm changes, but I don't understand how it can suddenly be exposed on a system which previously had a correct xorg.conf/drivers.

Some insight as to whether this should block the -updates migration would be appreciated.

Steve Langasek (vorlon) wrote :

Bug #475259 can't be exposed on a system whose X server is working, no; and I don't think that bug should block the -updates migration - but I was intimately involved in preparing the SRU, so I was giving you all the information I had in order to let you make your own judgement call here.

Argyle (kruegejj) wrote :

I still have this problem. Is it supposed to be fixed? What do I do?

Argyle: first of all it may help if you read the thread, as I have described the solution further up!;)

Gil Cara (jahzeelgface) wrote :

W: A error occurred during the signature verification. The repository is not updated and the previous index files will be used.GPG error: http://ph.archive.ubuntu.com hardy-updates Release: The following signatures were invalid: BADSIG 40976EAF437D05B5 Ubuntu Archive Automatic Signing Key <email address hidden>

W: Failed to fetch http://ph.archive.ubuntu.com/ubuntu/dists/hardy-updates/Release

W: Some index files failed to download, they have been ignored, or old ones used instead.
Is this a bug, Because i can't upgrade my obuntu. What shal i do?

Gil Cara [2009-12-19 1:23 -0000]:
> W: Some index files failed to download, they have been ignored, or old ones used instead.
> Is this a bug, Because i can't upgrade my obuntu. What shal i do?

Just try again a bit later, you might just have hit a bad time on your
mirror.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers