upgrade to upstart 1.3-0ubuntu11 causes unclean shutdown

Bug #886439 reported by Alan on 2011-11-05
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysvinit (Ubuntu)
High
Clint Byrum
Oneiric
High
Clint Byrum
upstart (Ubuntu)
High
Clint Byrum
Oneiric
Undecided
Unassigned

Bug Description

== SRU JUSTIFICATION ==

IMPACT: its possible, though not all that likely, that files and metadata will be out of sync when systems are shutdown or rebooted after upgrading upstart or any of the libraries it depends on (namely, libc6). Therefore the potential impact is data corruption.

TEST CASE:

Steps to reproduce:
1. Start an Ubuntu 11.10 Oneiric instance on EC2.
Example: ec2run ami-bbf539d2 -t t1.micro -k key
2. Save sudo lsof for reference.
3. apt-cache show upstart
The stock version of upstart installed is 1.3-0ubuntu10.
4. sudo apt-get update; sudo apt-get install upstart
upstart is upgraded to 1.3-0ubuntu11
5. Verify that /var/run/init.upgraded is in fact touched by the install.
6. Save sudo lsof for reference.
7. sudo shutdown -h now
The console displays while unmounting:
mount: / is busy
This error will not show when the bug is fixed.

DEV FIX: The problem is that /var/run is cleared out for the transition to /run before the 'init.upgraded' flag file is checked for. The order of operations is changed.

REGRESSION POTENTIAL: The code path that is changed is very straight forward and tight, just swapping the order of two basic sets of operations. I would consider this to have a low regression potential then.

=====

This causes filesystem recovery errors on next boot.

The post-upgrade lsof is attached.

Related branches

Alan (8libra) wrote :
Scott Moser (smoser) wrote :

verified per original reporter's instructions:

$ apt-cache policy upstart
upstart:
  Installed: 1.3-0ubuntu10
  Candidate: 1.3-0ubuntu11
  Version table:
     1.3-0ubuntu11 0
        500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ oneiric-updates/main amd64 Packages
 *** 1.3-0ubuntu10 0
        500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ oneiric/main amd64 Packages
        100 /var/lib/dpkg/status
$ sudo lsof > lsof.out.pre
$ sudo apt-get install upstart
...
Setting up upstart (1.3-0ubuntu11) ...
Installing new version of config file /etc/init/failsafe.conf ...
$ sudo lsof > lsof.out.post
$ sudo reboot

I'll attach lsof.out.post, lsof.out.pre, and a console log showing the 'mount: / is busy' and subsequent check on reboot.

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Changed in upstart (Ubuntu):
importance: Undecided → High
status: New → Confirmed
tags: added: regression-update
Changed in upstart (Ubuntu):
status: Confirmed → Triaged
assignee: nobody → Clint Byrum (clint-fewbar)
Clint Byrum (clint-fewbar) wrote :

The offending code is actually in sysvinit, in /etc/init.d/umountroot

It checks for /var/run/init.upgraded *AFTER* completely removing the contents of /var/run

We simply need to do this check before the /var/run mangling.

Changed in sysvinit (Ubuntu):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Clint Byrum (clint-fewbar)
Clint Byrum (clint-fewbar) wrote :

Testing confirms that upstart is doing the right thing, this is just a bug in sysvinit, so closing the upstart task. Fix for the sysvinit issue is in the merge proposal, and should be uploaded to precise soon after a bit more testing.

Changed in upstart (Ubuntu):
status: Triaged → Invalid
Changed in upstart (Ubuntu Oneiric):
status: New → Invalid
Changed in sysvinit (Ubuntu Oneiric):
status: New → In Progress
Changed in sysvinit (Ubuntu):
status: Triaged → In Progress
Changed in sysvinit (Ubuntu Oneiric):
importance: Undecided → High
assignee: nobody → Clint Byrum (clint-fewbar)
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.88dsf-13.10ubuntu8

---------------
sysvinit (2.88dsf-13.10ubuntu8) precise; urgency=low

  * d/src/initscripts/etc/init.d/sendsigs: wait up to 300 extra
    seconds for upstart jobs that have been killed. They will be sent
    SIGKILL by upstart when their 'kill timeout' has been reached, so
    we should trust the job's author to give the service a reasonable
    amount of time to shut down. (LP: #688541)
  * also omit pids of stop/killed upstart jobs since we know they've
    been killed already.
  * d/src/initscripts/etc/init.d/umountroot: Check for init.upgraded
    file in /var/run before clearing out /var/run. (LP: #886439)
 -- Clint Byrum <email address hidden> Mon, 12 Dec 2011 16:16:37 -0800

Changed in sysvinit (Ubuntu):
status: In Progress → Fix Released
Clint Byrum (clint-fewbar) wrote :

Oneiric fix is waiting in the oneiric-proposed queue.

Changed in sysvinit (Ubuntu Oneiric):
status: In Progress → Fix Committed

Hello Alan, or anyone else affected,

Accepted sysvinit into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Alan (8libra) wrote :

The patch works great. Upgraded initscripts from oneiric-proposed, then upgraded upstart. Shutdown is clean. I see the difference between the two copies of umountroot. Great work!

Clint Byrum (clint-fewbar) wrote :

Thanks for testing Alan!

Because this came bundled with bug #688541, that one will have to be verified as well before this can move to oneiric-updates. It would be fantastic if you could verify that one as well (usually we ask that somebody other than the developer verify fixes). The test case is a bit long, but quite simple, and would be tested in a similar fashion with verifying the console output. Otherwise we'll have to wait until somebody on the QA team gets around to verifying it.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.88dsf-13.10ubuntu4.1

---------------
sysvinit (2.88dsf-13.10ubuntu4.1) oneiric-proposed; urgency=low

  * d/src/initscripts/etc/init.d/sendsigs: wait up to 300 extra
    seconds for upstart jobs that have been killed. They will be sent
    SIGKILL by upstart when their 'kill timeout' has been reached, so
    we should trust the job's author to give the service a reasonable
    amount of time to shut down. (LP: #688541)
  * also omit pids of stop/killed upstart jobs since we know they've
    been killed already.
  * d/src/initscripts/etc/init.d/umountroot: Check for init.upgraded
    file in /var/run before clearing out /var/run. (LP: #886439)
 -- Clint Byrum <email address hidden> Mon, 12 Dec 2011 16:08:10 -0800

Changed in sysvinit (Ubuntu Oneiric):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers