[SRU] ubuntu-touch livefs builds kill upstart in host

Bug #1430403 reported by Colin Watson on 2015-03-10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
James Hunt

Bug Description

= Summary =

The version of Upstart in precise is affected by a bug in the way that ".override" [1] file handling is performed.

If a job has an override file ("/etc/init/*.override") and that override file is deleted before the corresponding job configuration file ("/etc/init/*.conf"), there is a possibility of a crash.

== Explanation ==

When a "/etc/init/*.override" file is deleted, Upstart will automatically detect this and reload the corresponding "/etc/init/*.conf" file.

However, if the ".conf" file (which the ".override" file corresponded to) is deleted at the same time Upstart attempts to read the ".conf" file, an assertion failure could result.

= Code Specifics =

The erroneous function is "conf_delete_handler()" which is called whenever a file Upstart is watching gets deleted. The end of this function calls conf_reload_path() but although it logs an error message, it does not consume the error object that gets raised when conf_reload_path() fails.

= Affected Releases =

This bug is only present in precise:

- Upstart override handling was introduced in Upstart v1.3:
- Precise currently uses Upstart 1.5-0ubuntu7.2 (and hence is affected).
- Lucid currently uses Upstart 0.6.5-8 (hence, not affected).
- Trusty and Vivid use much newer versions of the Upstart which no
  longer contain the problematic code.

= Fix =

The fix is simply to have conf_delete_handler() consume the error object (by freeing it) when conf_reload_path() fails.

= Test Case =

A reliable test case is unfortunately not possible to create, since the problem comes down to Upstart racing with the deletion of the ".conf"

However, the patch is small and it can be seen that every other failing call to conf_reload_path() free's the resulting error object.

= Workarounds =

The problem is only manifested if the ".conf" and the ".override" file get deleted one after another, with the ".override" file being deleted first. This implies the following work-arounds to avoid the problem if you wish to delete both files "at the same time":

1) Ensure the ".conf" file is deleted first.

2) Delete the ".override" file first, and then wait for a small period of time before deleting the corresponding ".conf" file.

3) Delete the ".override" file first, then call "sudo initctl reload-configuration" and then delete the corresponding ".conf" file.

= Regression Potential =

None expected. The problem is difficult to trigger anwyay and the patch can be seen to correct (what is now) an obvious coding error.

[1] - http://upstart.ubuntu.com/cookbook/#override-file

= Original Description =

ubuntu-touch livefs builds have started killing upstart in the host system (in this case, precise, although a similar bug appears to be present in current versions). The livefs build completes, but the host dies shortly after launchpad-buildd starts trying to remove the build chroot. The kernel log looks like this:

Mar 10 13:46:55 allspice kernel: [3743880.621603] init: /home/buildd/build-LIVEFSBUILD-22254/chroot-autobuild/build/chroot/etc/init/tty1.conf: Unable to reload configuration after override deletion
Mar 10 13:46:55 allspice kernel: [3743880.642455] init: file.c:110: Unhandled error from nih_file_read: No such file or directory
Mar 10 13:46:55 allspice kernel: [3743880.754281] init: Caught abort, core dumped
Mar 10 13:46:55 allspice kernel: [3743880.754375] init: file.c:110: Unhandled error from nih_file_read: No such file or directory
Mar 10 13:46:55 allspice kernel: [3743880.757830] init: Caught abort, core dumped

This appears to be because a couple of functions call conf_reload_path, which may leave an nih_error in place if nih_file_read fails, but then do not dispose of the nih_error. The pattern near the end of conf_file_visitor (in precise) is probably appropriate.

We're working around this to some extent in livecd-rootfs by removing the .override files first, but it should never be possible for a chroot to crash the host's init.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
James Hunt (jamesodhunt) wrote :

conf_reload_path() calls nih_file_read(). Although the return value of that call is checked, the code is wrong. It looks likely that this bug crept in since, unfortunately, the documentation for nih_file_read() is incorrect. It states:

    Returns: newly allocated string or NULL if insufficient memory.

... whereas it *should* state:

    Returns: newly allocated string or NULL on raised error.

As such, if the call to nih_file_read() fails, the error object should be destroyed before continuing to avoid the crash seen above.

Changed in upstart (Ubuntu):
importance: Undecided → High
assignee: nobody → James Hunt (jamesodhunt)
James Hunt (jamesodhunt) wrote :

The crash seems to originate from the conf_delete_handler() where although an error message gets logged, the actual error itself is not cleared. Note that this bug is present in 1.5-0ubuntu7.2 (precise), but is not present in trusty (1.12.1-0ubuntu4.2) or vivid (1.13.2-0ubuntu9).

James Hunt (jamesodhunt) wrote :

Attached is a minimal fix for the bug.

Note that since this issue does not affect newer releases, we cannot follow the usual process of testing in the dev release. However, as can be seen, the fix is small and quite clear. Note that the change to main.c is required to make upstart build on precise (looks like there was a bzr / source package disconnect somewhere).

James Hunt (jamesodhunt) wrote :

It is relatively simple to trigger the bug: run a script that loops creating a .conf file and an .override file, then deletes the .conf file before deleting the override file.

James Hunt (jamesodhunt) wrote :

lp keeps timing out on me when I attempt to attach a file. Meantime...

I've managed to force a crash after ~150 iterations of bug-1430403.sh using http://people.canonical.com/~jhunt/upstart/bug-1430403.tgz.

To recreate:

$ sudo tar xvfz bug-1430403.tgz
$ sudo bash bug-1430403.sh

James Hunt (jamesodhunt) wrote :

Note that we cannot add new unit tests for this fix without making it a much bigger change since the problematic function is static.

tags: added: patch
James Hunt (jamesodhunt) on 2015-03-11
description: updated
James Hunt (jamesodhunt) on 2015-03-11
summary: - ubuntu-touch livefs builds kill upstart in host
+ [SRU] ubuntu-touch livefs builds kill upstart in host
Changed in upstart (Ubuntu):
status: Confirmed → In Progress
James Hunt (jamesodhunt) on 2015-03-11
description: updated
description: updated

Hello Colin, or anyone else affected,

Accepted upstart into precise-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/upstart/1.5-0ubuntu7.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in upstart (Ubuntu Precise):
status: New → Fix Committed
tags: added: verification-needed
Adam Conrad (adconrad) wrote :

Tested in production, and verified that it resolves the crash.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.5-0ubuntu7.3

upstart (1.5-0ubuntu7.3) precise-proposed; urgency=medium

  * init/conf.c: conf_delete_handler(): Clear error if conf_reload_path()
    fails (LP: #1430403).
 -- James Hunt <email address hidden> Wed, 11 Mar 2015 14:00:42 +0000

Changed in upstart (Ubuntu Precise):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for upstart has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers