init crashed with SIGSEGV

Bug #1269731 reported by Kai Kasurinen on 2014-01-16
98
This bug affects 20 people
Affects Status Importance Assigned to Milestone
upstart
Critical
James Hunt
alsa-utils (Ubuntu)
Critical
Luke Yelavich
upstart (Ubuntu)
Critical
James Hunt

Bug Description

$ telinit U

ProblemType: Crash
DistroRelease: Ubuntu 14.04
Package: upstart 1.11-0ubuntu1
ProcVersionSignature: Ubuntu 3.12.0-8.16-generic 3.12.6
Uname: Linux 3.12.0-8-generic x86_64
ApportVersion: 2.13.1-0ubuntu1
Architecture: amd64
Date: Thu Jan 16 08:03:58 2014
ExecutablePath: /sbin/init
ExecutableTimestamp: 1384521990
InstallationDate: Installed on 2012-12-22 (389 days ago)
InstallationMedia: Ubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120823.1)
ProcCmdline: /sbin/init
ProcCwd: /
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
ProcKernelCmdline: BOOT_IMAGE=/boot/vmlinuz-3.12.0-8-generic root=UUID=8905185c-9d82-498c-970c-6fdb9ee07c45 ro quiet splash vt.handoff=7
SegvAnalysis:
 Segfault happened at: 0x7f1191eb98d9: mov 0x20(%rax),%rsi
 PC (0x7f1191eb98d9) ok
 source "0x20(%rax)" (0x00000020) not located in a known VMA region (needed readable region)!
 destination "%rsi" ok
SegvReason: reading NULL VMA
Signal: 11
SourcePackage: upstart
StacktraceTop:
 ?? ()
 ?? ()
 ?? ()
 ?? ()
 ?? ()
Title: init crashed with SIGSEGV
UpgradeStatus: Upgraded to trusty on 2013-06-19 (210 days ago)
UpstartBugCategory: System
UpstartRunningSystemVersion: init (upstart 1.11)
UserGroups:

_LogindSession: /
_MarkForUpload: True
modified.conffile..etc.NetworkManager.NetworkManager.conf: [modified]
modified.conffile..etc.default.whoopsie: [modified]
modified.conffile..etc.libvirt.qemu.networks.default.xml: [modified]
mtime.conffile..etc.NetworkManager.NetworkManager.conf: 2013-12-11T09:58:09.227262
mtime.conffile..etc.default.whoopsie: 2013-12-19T16:30:43.528594
mtime.conffile..etc.libvirt.qemu.networks.default.xml: 2013-06-29T21:43:59.656854

Related branches

Kai Kasurinen (kai-kasurinen) wrote :

StacktraceTop:
 conf_file_serialise (file=file@entry=0x7f11929f2440) at conf.c:1633
 conf_source_serialise (source=source@entry=0x7f11929ef7f0) at conf.c:1377
 conf_source_serialise_all () at conf.c:1416
 state_to_string (json_string=json_string@entry=0x7fff6f682e20, len=len@entry=0x7fff6f682e28) at state.c:388
 stateful_reexec () at state.c:1986

Changed in upstart (Ubuntu):
importance: Undecided → Medium
tags: removed: need-amd64-retrace
Kai Kasurinen (kai-kasurinen) wrote :

related to Bug #1269669

information type: Private → Public
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
James Hunt (jamesodhunt) wrote :

Hi Kai - thank you for reporting this bug. Please see below for simple instructions on how to work around the issue whilst we investigate it:

https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1269405/comments/30

Changed in upstart (Ubuntu):
importance: Medium → Critical
assignee: nobody → James Hunt (jamesodhunt)
Harry (harry33) wrote :

This system freeze (kernel panic) occurs every time the two newest versions of eglibc (2.18-0ubuntu5 and 2.18-0ubuntu6) are installed, either with synaptic (GUI) or directly from terminal or from console (tty1).
Downgrading to the eglibc version 2.18-0ubuntu4 does not cause the system freeze.

Harry (harry33) wrote :

I have recovered my setup with live-USB and chrooting.
Then with the command:
sudo dpkg --configure a

James Hunt (jamesodhunt) wrote :

Hi Harry - please can you confirm that you are running upstart version 1.11-0ubuntu1 and that running 'sudo telinit u' with:

- eglibc version 2.18-0ubuntu4 does not result in a crash.
- eglibc version 2.18-0ubuntu5 *does* result in a crash.

Thanks very much!

Michael Marley (mamarley) wrote :

I can't speak for him, but I can say that I have tried 2.18-0ubuntu5 and 2.18-0ubuntu2 and I get a KP with both of them with I run "sudo telinit u".

James Hunt (jamesodhunt) wrote :

If someone would be prepared to boot their system and force a crash with Upstart running in debug mode that would be very helpful:

- Temporarily modify grub config in grub menu to:
    - remove "quiet"
    - remove "splash"
    - add "--debug"
- Boot system
- ctrl-alt-f1 and login
- sudo telinit u
- attach a photo of the screen to this bug.

Thank you!

Michael Marley (mamarley) wrote :

I took a picture of my output too, though it doesn't look any different from the one without --debug.

Harry (harry33) wrote :

I can confirm I have upstart (1.11-0ubuntu1) installed.
And yes eglibc (2.18-0ubuntu5) does result in crash, while v. 2.18-0ubuntu4 does not.

However, this crash happens in my setup, which has Nvidia graphics card (GTX 285) and proprieatary drivers (331) installed.
But, my other installation with ATI graphics card (HD2600) does not crash at all.

James Hunt (jamesodhunt) wrote :

Hi Harry - thanks for testing this. The results you are seeing are very confusing since the changes between 2.18-0ubuntu4 and 2.18-0ubuntu5 are nominally only relevant for arm and powerpc systems. What could be different is the version of gcc used to build each of the eglibc and upstart versions since gcc has changed recently.

What would be useful to know is what happens if you down-graded upstart to version 1.10-0ubuntu9 and tried 'telinit u' with the 2 different versions of eglibc you mention. This might help us identify if the issue is caused by the compiler.

Michael Marley (mamarley) wrote :

Here is all the initscripts from my system, as requested by TJ in #ubuntu+1.

TJ (tj) wrote :
Download full text (3.6 KiB)

Michael attached the conf files are my request based on my analysis of the SEGFAULT trace results:

Segfault happened at: 0x7f1191eb98d9: mov 0x20(%rax),%rsi

## Registers.txt shows:
rax 0x0 0
rbx 0x7f1192f18ed0 139713456475856
rcx 0xe 14
rdx 0x7f1192f18fc0 139713456476096
rsi 0x10 16
rdi 0xffffffffffffffff -1
rbp 0x7f11929f2440 0x7f11929f2440
rsp 0x7fff6f682d60 0x7fff6f682d60

## So %rax is 0 (NULL)

## Disassembly shows:

=> 0x7f1191eb98d9: mov 0x20(%rax),%rsi
   0x7f1191eb98dd: mov 0x10(%rax),%rdi
   0x7f1191eb98e1: callq 0x7f1191ea28a0

## Which represents job_class_get_registered (file->job->name, file->job->session)

#0 conf_file_serialise (file=file@entry=0x7f11929f2440) at conf.c:1633
  1628: * re-exec. This may change though immediately after re-exec
  1629: * when conf_reload() gets called.
  1630: *
  1631: * See job_class_serialise_all() for further details.
  1632: */
  1633: registered = job_class_get_registered (file->job->name,
  1634: file->job->session);
  1638:

## 0x20(%rax) and 0x10(%rax) are offsets into struct JobClass (from init/job_class.h):
typedef struct job_class {
  NihList entry;

## 0x10(%rax)
  char *name;
  char *path;
## 0x20(%rax)
  Session * session;

## On amd64 pointers are 8 bytes wide so 'name' and 'session' will be 0x10 bytes apart.
## %rax will be the pointer "file->job"
## which means it is unexpectedly NULL

## Looking for opportunities for that to happen leads to init/conf.c::conf_reload_path()

static int
conf_reload_path (ConfSource *source,
      const char *path,
      const char *override_path)
{
  ConfFile *file = NULL;
...
  /* Create a new ConfFile structure (if no @override_path specified) */
  file = (ConfFile *)nih_hash_lookup (source->files, path);
  if (! file)
    file = NIH_MUST (conf_file_new (source, path));

### at this point file->job should be NULL

 switch (source->type) {
...
  case CONF_JOB_DIR:

    name = conf_to_job_name (source->path, path);

    /* Create a new job item and parse the buffer to produce
     * the job definition.
     */
...
## the call to init/job_parse.c::parse_job() can return NULL

    file->job = parse_job (NULL, source->session, file->job,
        name, buf, len, &pos, &lineno);

    /* Allow the original ConfFile which has now been replaced to be
     * destroyed which will also cause the original JobClass to be
     * freed.
     */
    if (file->job) {
      job_class_consider (file->job);
    } else {
      err = nih_error_get ();
    }

## init/job_parse.c::parse_job() can return NULL:

JobClass *
parse_job (const void *parent,
     Session *session,
     JobClass *update,

## update is the local reference to file->job

     const char *name,
     const char *file,
     size_t len,
     size_t *pos,
     size_t *lineno)
{
  JobClass *class;

  nih_assert (name != NULL);
  nih_assert (file != NULL);
  nih_assert (pos != NULL);

  if (update) {
    class = update;
    nih_debug ("Reusing JobClass %s (%s)",
        class->name, class->path);
  } else {

## should be in this path ...

Read more...

TJ (tj) wrote :

Michael's "/etc/init/" showed 4 upstart conf files updated recently; three on the 14th when this issue began being reported:

$ ls -latr
...
-rw-r--r-- 1 tj tj 480 Jan 9 11:34 nvidia-persistenced.conf
-rw-r--r-- 1 tj tj 483 Jan 14 04:28 alsa-store.conf
-rw-r--r-- 1 tj tj 609 Jan 14 04:28 alsa-state.conf
-rw-r--r-- 1 tj tj 494 Jan 14 04:28 alsa-restore.conf

I asked Michael to remove them and run the 'telinit u' tests again. With these files removed the "get_state.sh" test did not provoke a fault.

Adding back the three alsa files caused the fault so he tested the alsa conf files one by one.

alsa-restore.conf == SEGFAULT
alsa-state.conf == SEGFAULT
alsa-store.conf == SEGFAULT

TJ (tj) wrote :

The three alsa config files incorrectly use shell constructions for testing variables:

WRONG:

if ! test -d $ALSACTLHOME ; then

CORRECT:

if [ ! -d $ALSACTLHOME ]; then

Changed in alsa-utils (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
TJ (tj) wrote :

I'm in the process of rustling up a patch for alsa-utils, but this highlights that upstart really needs guard conditions for file->job == NULL before trying to dereference it.

Changed in alsa-utils (Ubuntu):
assignee: nobody → TJ (tj)
status: Triaged → In Progress
TJ (tj) wrote :

Attaching a tar.gz archive that can be extracted to /etc/init/ which replaces the faulty alsa-utils upstart conf files:

$ tar -tf lp1269731-upstart-alsa-utls.tar.gz
etc/
etc/init/
etc/init/alsa-state.conf
etc/init/alsa-restore.conf
etc/init/alsa-store.conf

To install, download the attachment and then do:

sudo tar -xzf lp1269731-upstart-alsa-utls.tar.gz -C /

What did you do to reproduce this segfault?

Secondly, the original syntax used with the test command is perfectly valid in POSIX shell. Its even present in some of the system's important init scripts. I even just tested this in a script myself.

TJ (tj) on 2014-01-16
Changed in alsa-utils (Ubuntu):
status: In Progress → Triaged
assignee: TJ (tj) → nobody

However, on further examination, I did forget to add an env declaration for the ALSACTLHOME environment variable. Its likely that is what is choaking things. But a segfault from init, that still points to a bug in upstart I think.

Alan (alanjas) wrote :

My computer inits perfect. My problem is when I try install the new version:

libc6_2.18-0ubuntu5_i386.deb

It generates a kernel panic in the part of "configuration" of the package.

I report this bug as #1269669

Alan (alanjas) wrote :

This is my KP.

Luke Yelavich (themuso) wrote :

Sorry, I was replying by emaill, and wasn't aware that this was a larger bug at the time.

Changed in alsa-utils (Ubuntu):
assignee: nobody → Luke Yelavich (themuso)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package alsa-utils - 1.0.27.2-1ubuntu2

---------------
alsa-utils (1.0.27.2-1ubuntu2) trusty; urgency=medium

  * Forgot to add an env declaration for an environment variable to the
    upstart files. (LP: #1269731)
  * debian/patches/work_around_ncurses_weirdness.patch:
    - Work around some ncurses pkg-config changes for now, not sure if ncurses
      upstream intended the include dir changes, or whether its a bug. Fixes
      FTBFs.
 -- Luke Yelavich <email address hidden> Fri, 17 Jan 2014 15:08:13 +1100

Changed in alsa-utils (Ubuntu):
status: Triaged → Fix Released
James Hunt (jamesodhunt) wrote :

@TJ - thanks for your analysis, which mirrors our findings. We now have a fix to Upstart and new tests to ensure this problem stays fixed.
@Luke - thanks too for fixing alsa-utils! :)

Harry (harry33) wrote :

For me, the freeze and this issue was fixed with installing the new alsa-utils (1.0.27.2-1ubuntu2).

James Hunt (jamesodhunt) on 2014-01-18
Changed in upstart (Ubuntu):
status: Confirmed → In Progress
James Hunt (jamesodhunt) on 2014-01-21
Changed in upstart (Ubuntu):
status: In Progress → Fix Committed
Changed in upstart:
assignee: nobody → James Hunt (jamesodhunt)
status: New → Fix Committed
importance: Undecided → Critical
James Hunt (jamesodhunt) wrote :

This bug is fixed in upstart 1.11-0ubuntu2 which will be pushed out once the alpha 2 freeze is over (Thurday).

If you are still affected by this problem (in other words you have not managed to upgrade to alsa-utils 1.0.27.2-1ubuntu2), you will need to do *one* of the following:

1) Disable upstart re-exec whilst running the upgrade:

    $ sudo su -
    # mkdir /root/bin
    # ln -s /bin/true /root/bin/telinit
    # chmod 755 /root/bin/telinit
    # export PATH=/root/bin:$PATH
    # dpkg --configure -a && sudo apt-get update && sudo apt-get upgrade

2) Temporarily move the invalid alsa job file aside whilst performing the upgrade

    $ sudo mkdir /root/tmp/
    $ sudo mv /etc/init/alsa-*.conf /root/tmp/
    $ sudo dpkg --reconfigure -a && sudo apt-get update && sudo apt-get upgrade
    $ sudo mv /root/tmp/alsa-*.conf /etc/init/

I'd recommend doing (1) as it does not require you to "undo" anything (as required in the final 'mv' operation in (2) above).

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.11-0ubuntu2

---------------
upstart (1.11-0ubuntu2) trusty; urgency=low

  [ Steve Langasek ]
  * Merge Debian packaging changes from 1.10-2.

  [ James Hunt ]
  * debian/manpages/upstart-events.7: Remove Job States and Job Lifecycle
    sections since they have been added to init.8 upstream (closes: bug#732125).
  * debian/manpages/upstart-events.7:
    - Added missing dbus and dconf events.
    - Added all inline events to SEE ALSO section for quick reference.
      (when coupled with latest upstream doc changes, closes: bug#732128).
  * debian/control: add Build-Depends on libtool for test_conf_preload.sh.
  * Cherry-pick fix for handling re-exec if job is invalid (LP: #1269731).
 -- James Hunt <email address hidden> Tue, 21 Jan 2014 11:50:09 +0000

Changed in upstart (Ubuntu):
status: Fix Committed → Fix Released
medoasus (ah-shapan) on 2014-02-04
Changed in upstart:
status: Fix Committed → Fix Released
David Oftedal (rounin) wrote :

This still happens on my system when trying to upgrade libc6 during the upgrade to 10.04.
Adding a fake telinit binary to the path was necessary to work around it.

David Oftedal (rounin) wrote :

And when I "apt-get show upstart", I get "Version: 1.12.1-0ubuntu1".
That means that the bug either wasn't fixed in version 1.11-0ubuntu2, or there's been a regression after version 1.11-0ubuntu2.

On Sat, Apr 05, 2014 at 07:09:56PM -0000, David Oftedal wrote:
> And when I "apt-get show upstart", I get "Version: 1.12.1-0ubuntu1".
> That means that the bug either wasn't fixed in version 1.11-0ubuntu2, or
> there's been a regression after version 1.11-0ubuntu2.

No, it doesn't mean either of those things; it only means that the
currently-available version of upstart is 1.12.1-0ubuntu1. But that is not
the version you would have been running *during* the upgrade, so that is not
the version that segfaulted.

You said you ran into this "during the upgrade to 10.04", but that's clearly
not the release you were upgrading to if you now have the upstart from
14.04 in your apt list. You will need to provide more information about
what version of Ubuntu you were upgrading *from* when this happened. You
probably should also be filing a separate bug report, since this bug is
about a crash only in a 14.04 pre-release version of init: while you may
have experienced a crash of init when upgrading from a previous Ubuntu
release, it would almost certainly not be *this* bug.

David Oftedal (rounin) wrote :

I stand corrected! Now that you mention it, upstart would have been upgraded at some point during the upgrade, and that could very well have been after the upgrade of libc which caused the kernel panic.

The upgrade was from the previous version, i.e. something like 13.10, with updates installed.

Either way, after reading your post, I ran "telinit u" to verify, and this time it runs without any incident.

So it looks like it has been fixed as stated above. Apologies for the false alarm, and thank you for the explanation!

David Oftedal (rounin) wrote :

Ah! One thing, though:

Perhaps an updated version of upstart should actually be released for version 13.10?
Seeing as this bug might affect a lot of people trying to upgrade.

Steve Langasek (vorlon) wrote :

On Sat, Apr 05, 2014 at 09:04:47PM -0000, David Oftedal wrote:
> Ah! One thing, though:

> Perhaps an updated version of upstart should actually be released for
> version 13.10? Seeing as this bug might affect a lot of people trying to
> upgrade.

The point is that the bug report you're following up to is for a bug that
was only introduced *after* the 13.10 release. So any crash you saw
upgrading from 13.10 is a different issue, which is why we need a different
bug report.

What made you think that this bug was the same crash you encountered?

A crash of upstart on Ubuntu should leave a file in /var/crash/ named
_sbin_init.0.crash. Do you have such a file on your system?

David Oftedal (rounin) wrote :

It's identical in every way to the bugs that's marked as duplicates of this ones:
- The way it's triggered (telinit)
- The context in which it's triggeredthe symptoms (upgrade from 13.10 to 14.04)
- The symptoms (kernel panic because of trying to kill init)
Which is another way of saying it's the same bug.

The notion that my bug is in 13.10 is an assumption on your part, which may very well be incorrect. What we're dealing with is a situation in which packages are being upgraded from one version to the other, meaning some parts of the system are from one version and some from the other.

If the bug was introduced after 13.10, then upstart was upgraded before libc6, and upgrading upstart and then installing the libc6 deb triggers the bug. That also means that it may in fact not be fixed, but could still be triggered by installing or upgrading some or all versions of libc6.

If upstart was not upgraded before libc6, then the bug was also there before it's thought to have been introduced.

David Oftedal (rounin) wrote :

As for the crash dumps, I don't have them. There seem to be many attachments posted in the duplicate bugs, but no crash dumps there either.

Anyway, don't take my word for it. The problem is solved in my particular case. If the bug is still there when people start upgrading for real, then they're going to let you know.

Steve Langasek (vorlon) wrote :

> It's identical in every way to the bugs that's marked as duplicates of this ones:
> - The way it's triggered (telinit)
> - The context in which it's triggeredthe symptoms (upgrade from 13.10 to 14.04)

That's incorrect. This bug did not involve upgrades from 13.10 to 14.04.

> - The symptoms (kernel panic because of trying to kill init)
> Which is another way of saying it's the same bug.

Otherwise, what you've described here is "every possible crasher bug in upstart when re-execing init".

David Oftedal (rounin) wrote :

> That's incorrect. This bug did not involve upgrades from 13.10 to 14.04.

Then let's put it this way: bug #1269405, #1269483, #1269500, #1269669 all mention that the bug is triggered when updating libc6 with dpkg, which is indeed the case with the bug I experienced two days ago.

David Oftedal (rounin) wrote :

Considering what you said in bug #1303139, I'm reassigning the "duplicate" bugs as duplicates of that bug, seeing as I have your assurance that they're a different bug from this one.

Steve Langasek (vorlon) wrote :

That was not what I said at all. I said that *your* bug was not a duplicate of this one. Please leave it to the developers to determine what bugs are or ate not duplicates.

David Oftedal (rounin) wrote :

Absolutely.

As I've mentioned, I believe bug #1269405, bug #1269483, bug #1269500 are identical to bug #1303139. Bug #1269669 affects a different package, so it seems like a related problem.

However, if you have reason to believe otherwise, then I'll leave it to you to mark them in the way you think is correct.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers