init assert failure: alloc.c:633: Assertion failed in nih_unref: ref != NULL

Bug #1222705 reported by James Hunt on 2013-09-09
50
This bug affects 6 people
Affects Status Importance Assigned to Milestone
upstart
High
James Hunt
upstart (Ubuntu)
High
James Hunt

Bug Description

Occurred immediately after bug 1222702 happened.

ProblemType: Crash
DistroRelease: Ubuntu 13.10
Package: upstart 1.10-0ubuntu1 [modified: usr/share/upstart/sessions/upstart-file-bridge.conf]
ProcVersionSignature: Ubuntu 3.11.0-4.9-generic 3.11.0-rc7
Uname: Linux 3.11.0-4-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.12.1-0ubuntu3
Architecture: i386
AssertionMessage: alloc.c:633: Assertion failed in nih_unref: ref != NULL
Date: Mon Sep 9 09:44:34 2013
ExecutablePath: /sbin/init
InstallationDate: Installed on 2010-10-21 (1054 days ago)
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
MarkForUpload: True
ProcCmdline: init --user
ProcEnviron:
 LANGUAGE=en_GB
 PATH=(custom, user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
Signal: 6
SourcePackage: upstart
StacktraceTop:
 nih_unref () from /lib/i386-linux-gnu/libnih.so.1
 ?? ()
 ?? ()
 ?? ()
 nih_child_poll () from /lib/i386-linux-gnu/libnih.so.1
Title: init assert failure: alloc.c:633: Assertion failed in nih_unref: ref != NULL
UpgradeStatus: Upgraded to saucy on 2013-06-23 (77 days ago)
UpstartBugCategory: Session
UpstartRunningSessionCount: 1
UpstartRunningSessionVersion: init (upstart 1.10)
UpstartRunningSystemVersion: init (upstart 1.10)
UserGroups: adm admin audio cdrom dialout kvm libvirtd lpadmin plugdev sambashare sbuild vboxusers
upstart.shutdown.override: manual

Related branches

James Hunt (jamesodhunt) wrote :

StacktraceTop:
 nih_unref (ptr=0xb83458a0, parent=0xb8303bf8) at alloc.c:629
 job_change_state (job=0xb8303bf8, state=<optimized out>) at job.c:381
 job_process_terminated (job=job@entry=0xb8303bf8, process=PROCESS_MAIN, status=0) at job_process.c:1819
 job_process_handler (data=0x0, pid=20284, event=<optimized out>, status=<optimized out>) at job_process.c:1471
 nih_child_poll () at child.c:217

Changed in upstart (Ubuntu):
importance: Undecided → Medium
tags: removed: need-i386-retrace
James Hunt (jamesodhunt) on 2013-09-09
description: updated
information type: Private → Public
James Hunt (jamesodhunt) wrote :

This looks odd. In the retraced stacktrace, ref is clearly showed as non-NULL (0xbfe870c8).

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
James Hunt (jamesodhunt) wrote :

We need a proper stack trace to aid debugging this issue. So, if you have seen this bug please run the following:

$ apport-collect 1222705

.home.phablet..config.upstart.mediascanner.override: manual
.home.phablet..config.upstart.mtp.server.override: manual
ApportVersion: 2.13.1-0ubuntu2
Architecture: armhf
DistroRelease: Ubuntu 14.04
InstallationDate: Installed on 2014-01-28 (0 days ago)
InstallationMedia: Ubuntu Trusty Tahr (development branch) - armhf (20140128)
Package: upstart 1.11-0ubuntu2
PackageArchitecture: armhf
Tags: trusty
Uname: Linux 3.4.0-3-mako armv7l
UpgradeStatus: No upgrade log present (probably fresh install)
UpstartBugCategory: Session
UpstartRunningSystemVersion: init (upstart 1.11)
UserGroups: adm autopilot cdrom dialout dip nopasswdlogin plugdev sudo tty video
_MarkForUpload: True
upstart.tty1.override: manual
upstart.tty2.override: manual
upstart.tty3.override: manual
upstart.tty4.override: manual
upstart.tty5.override: manual
upstart.tty6.override: manual

tags: added: apport-collected trusty

apport information

apport information

Michał Sawicz (saviq) wrote :

I got this again, it only retraced to:

--- stack trace ---
#0 0x407798e6 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
No symbol table info available.
#1 0x40787ffe in raise () from /lib/arm-linux-gnueabihf/libc.so.6
No symbol table info available.
#2 0x4078a898 in abort () from /lib/arm-linux-gnueabihf/libc.so.6
No symbol table info available.
#3 0x40226378 in nih_unref () from /lib/arm-linux-gnueabihf/libnih.so.1
No symbol table info available.
#4 0x4004fc2e in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
--- source code stack trace ---
#0 0x407798e6 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0x40787ffe in raise () from /lib/arm-linux-gnueabihf/libc.so.6
#2 0x4078a898 in abort () from /lib/arm-linux-gnueabihf/libc.so.6
#3 0x40226378 in nih_unref () from /lib/arm-linux-gnueabihf/libnih.so.1
#4 0x4004fc2e in ?? ()

Anywhere I can get more info? Like the actual abort message?

James Hunt (jamesodhunt) wrote :

Can you attach ~/.xsession-errors? Is there a /var/crash/*init*.crash? If you can recreate this running upstart with '--debug' that could also give us something.

Steve Langasek (vorlon) wrote :

Have just reproduced this bug here with Ubuntu Touch running on an emulator. When it happened, I was interacting with the upstart session bus from the commandline. I ran 'initctl list | grep app' to get the pid for an instance of the application-click job, so that I could manually kill it instead of having to construct the 'stop' command with all of the variables etc. So then I ran 'kill $pid', and saw that the process didn't immediately die, so sent 'kill -9 $pid'.

Then I checked 'initctl list' output again and saw that the job was in the post-stop script. Then I checked 'initctl list' again, and found that the connection was refused on the dbus socket because the upstart session init had died.

My backtrace matches that of Saviq in comment #12 (i.e., nothing interesting before the nih_unref() call).

James Hunt (jamesodhunt) on 2014-01-31
Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
James Hunt (jamesodhunt) on 2014-01-31
Changed in upstart:
status: New → In Progress
assignee: nobody → James Hunt (jamesodhunt)
importance: Undecided → Medium
James Hunt (jamesodhunt) on 2014-03-06
Changed in upstart:
status: In Progress → Fix Committed
Changed in upstart (Ubuntu):
status: Confirmed → Fix Released
Changed in upstart:
status: Fix Committed → Fix Released
James Hunt (jamesodhunt) wrote :

Original problem has not been fixed.

Changed in upstart:
status: Fix Released → Confirmed
Changed in upstart (Ubuntu):
status: Fix Released → Confirmed
Michał Sawicz (saviq) wrote :

Can we have severity of this bumped? It's often causing session restarts when a lot of upstart activity is happening (like unity8 autopilot tests, where unity8 is started with autopilot).

James Hunt (jamesodhunt) on 2014-07-14
Changed in upstart (Ubuntu):
importance: Medium → High
James Hunt (jamesodhunt) wrote :

Michał - if you can trigger this issue frequently, please could you attach a .tgz of ~/.cache/upstart/ and ~/.xsession-errors or better yet provide steps to allow me to recreate this issue. Also, if you could provide details of how these tests run, that would be useful. Can you tell me if multiple Session Init sessions are run by the tests one after another? If so, how is the old session stopped?

Michał Sawicz (saviq) wrote :

The easiest to reproduce this seems to be just "restart unity8" under phablet user on touch.

I'll try and see if I can reproduce the same under the emulator.

Please see attached the requested logs from when this happened just now (I've cleared the logs first).

Michał Sawicz (saviq) wrote :

Another set of logs with --debug.

Michał Sawicz (saviq) wrote :

And a symbolized .crash.

Michał Sawicz (saviq) wrote :

Confirmed, same happened in an emulator, example to reproduce:

$ sudo ubuntu-emulator create --channel=ubuntu-touch/devel-proposed --arch=i386 devel-proposed
# several minutes later
$ ubuntu-emulator run devel-proposed
# a couple minutes later you should see the wizard, go through it and wait for unity8 to appear
$ adb devices
$ phablet-shell -s emulator-5554
phablet $> restart unity8
restart: Connection was disconnected before a reply was received

Unfortunately apport doesn't seem to collect the .crash.

James Hunt (jamesodhunt) wrote :

Thanks Michał - your steps to recreate the issue helped me track down the cause :-)

Essentially, any job containing 'initctl unset-env' or 'initctl unset-env --global' in any process stanza (post-stop, pre-start, etc) will trigger the issue when that job is restarted.

Note that the problem is not seen if the job is stopped and then started.

James Hunt (jamesodhunt) on 2014-07-16
Changed in upstart:
importance: Medium → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers