upstart turns machines into a scene from 'Dead Rising'

Bug #141034 reported by James Troup on 2007-09-19
4
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
Undecided
Unassigned
Edgy
Critical
Scott James Remnant (Canonical)

Bug Description

Binary package hint: upstart

upstart from edgy will every so often turn machines into something that reminds me of 'Dead rising', with hundreds of zombie children and init segfaulting in a tight loop, i.e.:

Sep 19 17:35:13 zirconium init: Caught segmentation fault, core dumped
Sep 19 17:35:44 zirconium last message repeated 42299 times
Sep 19 17:36:45 zirconium last message repeated 85301 times

This is apparently a known bug, and fixed on feisty. That'd be fine since it previously only affected one of our edgy machines, but it's now started to happen on two others in quick succession. Since this cripples any box it affects, I'd really like to see it fixed in a SRU, please?

--
James

Confirmed as a bug in the edgy Upstart

Changed in upstart:
assignee: nobody → keybuk
importance: Undecided → Critical
status: New → Confirmed

Already fixed in feisty

Changed in upstart:
status: New → Fix Released

The initial crash is caused by an attempt to deference a NULL jobs list before entering the main loop.

The jobs list is NULL because Upstart failed to read its configuration.

Upstart failed to read its configuration because inotify was not available. This is a known issue with the edgy version of Upstart and is fixed in the feisty version; the fix is part of a significant overhaul of the config code, so not easy to backport.

Inotify was not available because there were no inotify descriptors available.

There were no inotify descriptors available because the limit for the root user had been reached.

The limit had been reached because Upstart doesn't close its inotify descriptor, so each re-exec consumes another. This is also a known issue with the edgy version and is fixed in feisty, the fix wasn't backported because we didn't believe it to have a serious effect.

Now we know there's a serious consequence, there's a simple one-line fix.

We would like to nominate this for fixing in an SRU.

Martin Pitt (pitti) wrote :

Thanks for the detailled explanation and reasoning! Patch looks fine, please go ahead and upload to -proposed.

Is there a way to forcefully trigger this behaviour, so that we have a way to verify the fix? (Verification recipe is required by the SRU policy).

Changed in upstart:
status: Confirmed → In Progress

Yes, as noted in the IRC log:

  for a in $(seq 1 128); do kill -TERM 1; sleep 1; done

Uploaded to -proposed

Martin Pitt (pitti) wrote :

Accepted into edgy-proposed. Please test.

Changed in upstart:
status: In Progress → Fix Committed

Martin Pitt <email address hidden> writes:

> Accepted into edgy-proposed. Please test.

Successfully tested as follows:

| root@molybdenum:~# dpkg -l upstart | grep ^ii
| ii upstart 0.2.7-7.1 event-based init daemon
| root@molybdenum:~# for a in `seq 1 250`; do echo -n $a"..."; kill -TERM 1; sleep 5; echo " done."; done

after which init was still happy. Thanks again for the quick fix,
Scott.

--
James

Brian Murray (brian-murray) wrote :

I recreated the bug using upstart version 0.2.7-7 and the script provided. I then updated upstart to version 0.2.7-7.1 and did not experience the bug when running that script.

Martin Pitt (pitti) wrote :

Copied to edgy-updates. Thanks for testing.

Changed in upstart:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers