Ubuntu

[kernel panic] init: log.c:786: Assertion failed in log_clear_unflushed: log->remote_closed

Reported by Raphael Gradenwitz on 2012-02-18
74
This bug affects 14 people
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
High
James Hunt
Precise
High
James Hunt

Bug Description

Since update upstart from 1.4-0ubuntu6 to 1.4-0ubuntu8
-> Kernel Panic, see picture.

summary: - init: log.c:786: Assertion failes in log_cleasr_unflushed:
+ init: log.c:786: Assertion failes in log_clear_unflushed:
log->remote_closed

Downgrading to 1.4-0ubuntu7 (in recovery console) did _not_ fix it.
But:
Downgrading to 1.3-0ubuntu11 (from here https://launchpad.net/ubuntu/precise/+package/upstart) fixed it and i could boot again without K-panic.

btw: Ubuntu 12.04, amd64

Fabien Tassin (fta) wrote :

Same here, precise 32bit.

I have a slightly different stack (but the same assert()). Booting on older kernels doesn't help. Rescue kernels are ok.

I also found independently that upstart 1.4-0ubuntu8 and u7 are NOK while u6 is OK so the culprit is u7.

Changed in upstart:
status: New → Confirmed
Fabien Tassin (fta) on 2012-02-18
summary: - init: log.c:786: Assertion failes in log_clear_unflushed:
+ [kernel panic] init: log.c:786: Assertion failed in log_clear_unflushed:
log->remote_closed
James Hunt (jamesodhunt) on 2012-02-20
affects: upstart → ubuntu
affects: ubuntu → upstart (Ubuntu)
Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
James Hunt (jamesodhunt) wrote :

Whilst we investigate this issue, please disable logging by adding "--no-log" to the kernel command-line. It is also possible to stop the log from being flushed by disabling the "flush-early-job-log" Upstart job (/etc/init/flush-early-job-log.conf), for example like this:

$ echo manual | sudo tee /etc/init/flush-early-job-log.override
$ sudo reboot

Fabien Tassin (fta) wrote :

alternatively, and since I've already downgraded to -ubuntu6, I simply put upstart on hold until a fix is available.

$ echo upstart hold | sudo dpkg --set-selections

James Hunt (jamesodhunt) wrote :

Can those affected by this bug confirm they have a /var/log/upstart/ directory? Also, attaching the output of the following command would be useful:

    /sbin/initctl show-config

Fabien Tassin (fta) wrote :

I do have that directory, containing a bunch of files.
Here is my initctl config.

Fabien Tassin (fta) wrote :

i compared my initctl config between 1 box impacted and another (which is quite similar), the main difference is that the impacted box has xinetd installed. Could that be it?

I did the same workaround as in #5

/var/log/upstart/ directory exists.

Fabien Tassin (fta) wrote :

neither #7 nor #9 has 'cgroup-lite', looks like a good candidate.

James Hunt (jamesodhunt) on 2012-02-20
Changed in upstart (Ubuntu):
importance: Undecided → High
James Hunt (jamesodhunt) wrote :

Please would those affected by this bug test the new upstart build version '1.4-0ubuntu9~bug935585' in the PPA below and provide feedback:

https://launchpad.net/~jamesodhunt/+archive/bug-935585/

Since we have still not managed to recreate this problem directly, it would also be helpful to know whether the assertion fails on every boot or whether the problem is intermittent.

Peter Silva (peter-bsqt) wrote :

For me it is every boot, and /var/log/upstart has 52 files in it.

James Hunt (jamesodhunt) wrote :

@Peter: Thank you for the information.

We are still unsure exactly what scenario is causing this assertion failure so would encourage all those experiencing this issue to try the Upstart version in comment #11. Feedback on this will provide us with valuable information which will help us to resolve this issue.

I could over and over recreate this kernel panic.
The workaround from #4 (--no-log) was each time necessary to bypass the bug and boot.

BUT...
vvvvvvvvvvvvvv
> #11 works! <
^^^^^^^^^^^^

Thanks!

Fabien Tassin (fta) wrote :

@jamesodhunt: works for me. no more panic. great, thanks.

Peter Silva (peter-bsqt) wrote :

@jamesodhunt: works for me also, no panic running 17 now.

James Hunt (jamesodhunt) wrote :

Out of interest, are affected users using SSD devices?

Fabien Tassin (fta) wrote :

@jamesodhunt: i'm not. 2 HD here.

Chris Peach (peachris+ubuntu) wrote :

I run two very similar virtual machines on a VMware ESX server, and only one of them is affected. The main difference is that the affected machine uses LVM.

I was able to make it boot using the --no-log kernel parameter. As a more permanent solution, I used the following line successfully:
$ echo manual | sudo tee /etc/init/flush-early-job-log.override

Please let more know if you would like to learn more about my LVM2 setup.

Peter Silva (peter-bsqt) wrote :

no SSD on the desktop system, just two HDD's.

No SSD, no LVM here but Multi-Boot (hth)

i can confirm this as well, but an interesting thing we tried was reverting to the inital 3.0.0 precise server kernel, and then booted back on the 3.2.0-17 kernel which worked for a while then the same issue surfaced.

running precise 64bit

our server is running lvm but not on an SSD drive

One thing i forgot to add is currently there is nothing virtualized on this machine, but we are in the process of setting up kvm and openstack.

Radek Zajic (radek-zajic) wrote :

I've upgraded to precise today (from oneiric) via do-release-upgrade -d. The bug also affected my system; the fix in #11 helped.

The hardware is Intel(R) Atom(TM) CPU 230 @ 1.60GHz, nVidia ION chipset (prestigio ION PC)

root@router-barrandov:~# lspci
00:00.0 Host bridge: NVIDIA Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: NVIDIA Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: NVIDIA Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: NVIDIA Corporation MCP79 Co-processor (rev b1)
00:04.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:06.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:06.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: NVIDIA Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: NVIDIA Corporation MCP79 PCI Bridge (rev b1)
00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1)
00:0b.0 SATA controller: NVIDIA Corporation MCP79 AHCI Controller (rev b1)
00:10.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1)
02:00.0 VGA compatible controller: NVIDIA Corporation ION VGA (rev b1)

root@router-barrandov:~# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 152d:2338 JMicron Technology Corp. / JMicron USA Technology Corp. JM20337 Hi-Speed USB to SATA & PATA Combo Bridge
Bus 004 Device 002: ID 0458:0708 KYE Systems Corp. (Mouse Systems)
Bus 004 Device 003: ID 03f0:6004 Hewlett-Packard DeskJet 5550

2,5" SATA HDD, 3,5" HDD attached via USB (mass storage)

twelve17 (spam-twelve17) wrote :

Confirm that ppa from #11 worked for me as well. I am running Ubuntu 12 beta under Virtualbox. (For what it's worth, host is not running SSD, and guest box is pretending to have a regular SATA drive.

Chris Peach (peachris+ubuntu) wrote :

I successfully tested the patched upstart (1.4-0ubuntu9~bug935585) from the PPA mentioned in comment #11.

Of course, I had first disabled my workaround:
root@vmwareguest:~# rm /etc/init/flush-early-job-log.override

Then I installed the patched upstart and restarted the system half a dozen times without a hiccup. Good work!

t3rmin (matt-thetrents) wrote :

I experience this bug on kernel 3.2.0-17 and 3.2.0-18, but not when I select 3.0.0-16 (which I assume is a leftover from before upgrading to precise) at the boot menu. Installing the upstart PPA from #11 allowed me to boot into the 3.2 kernels.

Travis Rhoden (trhoden) wrote :

I also ran into this problem with both 3.2.0-17 and -18. I used the new Upstart from #11, and can now boot up successfully.

alessandro ciancaglini (alo) wrote :

We also ran into this problem with 3.2.0-18 on 2 servers. the strange thing is that on a third server everithing went ok.... (same model old Dell 860).

We used the new Upstart from #11, and can now boot up all servers successfully.

thanks!

Roman Yepishev (rye) wrote :

Right now I am able to reproduce this when I force the filesystem check (touch /forcefsck) on a real machine. I can't reproduce this with similar kvm setup. As Daviey pointed out, this may be a race condition.

Roman Yepishev (rye) wrote :

After enabling verbose mode I get the following:

[timestamp] init: log.c:786: Assertion failed in log_clear_unflushed: log->remote_closed
[timestamp] init: Caught abort, core dumped
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
[timestamp] Kernel panic - not syncing: Attempted to kill init!

Roman Yepishev (rye) wrote :

The PPA version works properly, I am unable to reproduce the issue any more.

David Kranz (david-kranz) wrote :

I had this problem when rebooting after installing openstack. This ppa fixed the problem.

Oh great - I get this too.
I worked around it by booting to recovery mode, doing an fsck and resuming boot.

Karl (kh2l) wrote :

This happens for me as well on a VMware VM, I thought that maybe LVM was the cause as my first install was using VMware Easy Install and that most likely didn't use LVM, but I tried a non LVM configuration and it also broke.

The fix available here does fix the issue though for me, once it's installed the issue goes away.

Chris Peach (peachris+ubuntu) wrote :

Now my other VMware guest is affected, namely when trying to boot the new kernel 3.2.0-18-29 (x86_64). On this machine, I had not installed the patch from the PPA mentioned above. This VM does not use LVM. The only unusual feature of this VM is that it uses ecryptfs to encrypt one home directory. To make this machine boot, I had to enter the “--no-log” kernel parameter.

Steve Langasek (vorlon) on 2012-03-16
Changed in upstart (Ubuntu Precise):
status: Confirmed → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.4-0ubuntu9

---------------
upstart (1.4-0ubuntu9) precise; urgency=low

  [ Steve Langasek ]
  * debian/conf/failsafe.conf: instead of waiting for the 'runlevel' event
    before considering failsafe done, stop this job as soon as we're
    starting rc-sysinit; that way, any delays in /etc/rcS.d will not cause
    confusing messages about networking delays when the network is not the
    problem. (LP: #950662)

  [ James Hunt ]
  * init/log.c:log_read_watch(): Set remote_closed for scenarios where error
     handler never called. (LP: #935585)

  [ Serge Hally ]
  * debian/conf/power-status-changed.conf: shut down on getting SIGPWR.
    Unprivileged tasks can't send this signal. In particular this will
    allow clean shutdown of containers from the host.
    (See http://www.makelinux.net/man/7/P/power-status-changed)

  [ Stéphane Graber ]
  * Rename Serge's job to shutdown.conf to avoid a name conflict with the
    event power-status-changed.
 -- Stephane Graber <email address hidden> Fri, 16 Mar 2012 13:48:04 -0400

Changed in upstart (Ubuntu Precise):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers