[SRU] ubuntu-touch livefs builds kill upstart in host
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | upstart (Ubuntu) |
High
|
James Hunt | ||
| | Precise |
Undecided
|
Unassigned | ||
Bug Description
= Summary =
The version of Upstart in precise is affected by a bug in the way that ".override" [1] file handling is performed.
If a job has an override file ("/etc/
== Explanation ==
When a "/etc/init/
However, if the ".conf" file (which the ".override" file corresponded to) is deleted at the same time Upstart attempts to read the ".conf" file, an assertion failure could result.
= Code Specifics =
The erroneous function is "conf_delete_
= Affected Releases =
This bug is only present in precise:
- Upstart override handling was introduced in Upstart v1.3:
- Precise currently uses Upstart 1.5-0ubuntu7.2 (and hence is affected).
- Lucid currently uses Upstart 0.6.5-8 (hence, not affected).
- Trusty and Vivid use much newer versions of the Upstart which no
longer contain the problematic code.
= Fix =
The fix is simply to have conf_delete_
= Test Case =
A reliable test case is unfortunately not possible to create, since the problem comes down to Upstart racing with the deletion of the ".conf"
file.
However, the patch is small and it can be seen that every other failing call to conf_reload_path() free's the resulting error object.
= Workarounds =
The problem is only manifested if the ".conf" and the ".override" file get deleted one after another, with the ".override" file being deleted first. This implies the following work-arounds to avoid the problem if you wish to delete both files "at the same time":
1) Ensure the ".conf" file is deleted first.
2) Delete the ".override" file first, and then wait for a small period of time before deleting the corresponding ".conf" file.
3) Delete the ".override" file first, then call "sudo initctl reload-
= Regression Potential =
None expected. The problem is difficult to trigger anwyay and the patch can be seen to correct (what is now) an obvious coding error.
[1] - http://
= Original Description =
ubuntu-touch livefs builds have started killing upstart in the host system (in this case, precise, although a similar bug appears to be present in current versions). The livefs build completes, but the host dies shortly after launchpad-buildd starts trying to remove the build chroot. The kernel log looks like this:
Mar 10 13:46:55 allspice kernel: [3743880.621603] init: /home/buildd/
Mar 10 13:46:55 allspice kernel: [3743880.642455] init: file.c:110: Unhandled error from nih_file_read: No such file or directory
Mar 10 13:46:55 allspice kernel: [3743880.754281] init: Caught abort, core dumped
Mar 10 13:46:55 allspice kernel: [3743880.754375] init: file.c:110: Unhandled error from nih_file_read: No such file or directory
Mar 10 13:46:55 allspice kernel: [3743880.757830] init: Caught abort, core dumped
This appears to be because a couple of functions call conf_reload_path, which may leave an nih_error in place if nih_file_read fails, but then do not dispose of the nih_error. The pattern near the end of conf_file_visitor (in precise) is probably appropriate.
We're working around this to some extent in livecd-rootfs by removing the .override files first, but it should never be possible for a chroot to crash the host's init.
Related branches
| James Hunt (jamesodhunt) wrote : | #2 |
conf_reload_path() calls nih_file_read(). Although the return value of that call is checked, the code is wrong. It looks likely that this bug crept in since, unfortunately, the documentation for nih_file_read() is incorrect. It states:
Returns: newly allocated string or NULL if insufficient memory.
... whereas it *should* state:
Returns: newly allocated string or NULL on raised error.
As such, if the call to nih_file_read() fails, the error object should be destroyed before continuing to avoid the crash seen above.
| Changed in upstart (Ubuntu): | |
| importance: | Undecided → High |
| assignee: | nobody → James Hunt (jamesodhunt) |
| James Hunt (jamesodhunt) wrote : | #3 |
The crash seems to originate from the conf_delete_
| James Hunt (jamesodhunt) wrote : | #4 |
Attached is a minimal fix for the bug.
Note that since this issue does not affect newer releases, we cannot follow the usual process of testing in the dev release. However, as can be seen, the fix is small and quite clear. Note that the change to main.c is required to make upstart build on precise (looks like there was a bzr / source package disconnect somewhere).
| James Hunt (jamesodhunt) wrote : | #5 |
It is relatively simple to trigger the bug: run a script that loops creating a .conf file and an .override file, then deletes the .conf file before deleting the override file.
| James Hunt (jamesodhunt) wrote : | #6 |
lp keeps timing out on me when I attempt to attach a file. Meantime...
I've managed to force a crash after ~150 iterations of bug-1430403.sh using http://
To recreate:
$ sudo tar xvfz bug-1430403.tgz
$ sudo bash bug-1430403.sh
| James Hunt (jamesodhunt) wrote : | #7 |
Note that we cannot add new unit tests for this fix without making it a much bigger change since the problematic function is static.
| tags: | added: patch |
| description: | updated |
| summary: |
- ubuntu-touch livefs builds kill upstart in host + [SRU] ubuntu-touch livefs builds kill upstart in host |
| Changed in upstart (Ubuntu): | |
| status: | Confirmed → In Progress |
| description: | updated |
| description: | updated |
Hello Colin, or anyone else affected,
Accepted upstart into precise-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in upstart (Ubuntu Precise): | |
| status: | New → Fix Committed |
| tags: | added: verification-needed |
| Adam Conrad (adconrad) wrote : | #9 |
Tested in production, and verified that it resolves the crash.
| tags: |
added: verification-done removed: verification-needed |
| Launchpad Janitor (janitor) wrote : | #10 |
This bug was fixed in the package upstart - 1.5-0ubuntu7.3
---------------
upstart (1.5-0ubuntu7.3) precise-proposed; urgency=medium
* init/conf.c: conf_delete_
fails (LP: #1430403).
-- James Hunt <email address hidden> Wed, 11 Mar 2015 14:00:42 +0000
| Changed in upstart (Ubuntu Precise): | |
| status: | Fix Committed → Fix Released |
| Adam Conrad (adconrad) wrote : Update Released | #11 |
The verification of the Stable Release Update for upstart has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.


Status changed to 'Confirmed' because the bug affects multiple users.