Init aborts with the message: init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io

Bug #1188642 reported by regmka
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
upstart
Fix Released
Undecided
James Hunt
upstart (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

init (in Ubuntu 13.04) accidentally aborts with the following messages in the kernel log:

[ 4.981969] init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io
[ 4.991031] init: Caught abort, core dumped
[ 4.995571] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

I've figured out that the probable cause is a race between the line 699 of the file init/log.c of the function log_read_watch() that sets the field
log->remote_closed to 1 (the caller clears log->io) and the routine log_clear_unflushed() that causes the abort when log->remote_closed == 1 but log->io != NULL: log_clear_unflushed() may preempt log_read_watch() when
log->remote_closed is set to 1 but before log->io is cleared for
the same log.

I applied the following patch to prove this assertion:

@@ -264,6 +266,8 @@
     */
    log->io->error_handler = NULL;

+ nih_fatal("log %p: clearing log->io\n", log);
+
    nih_free (log->io);
    log->io = NULL;
   }
@@ -384,6 +388,8 @@

  nih_free (err);

+ nih_fatal("log %p: clearing log->io\n", log);
+
  /* Ensure the NihIo is closed */
  nih_free (log->io);
  log->io = NULL;
@@ -696,7 +702,11 @@
     *
     */
    if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
+ {
+ nih_fatal("log %p: setting log->remote_closed\n", log);
     log->remote_closed = 1;
+ }

    close (log->fd);
    log->fd = -1;
@@ -806,6 +816,8 @@
   nih_assert (log->open_errno);

   if (log->remote_closed) {
+ nih_fatal("log %p: checking log->io\n", log);
+
    /* Parent job has ended and unflushed data
     * exists.
     */

I got the following kernel log before the abort:

[ 4.635146] init: log 0xb882a660: clearing log->io
[ 4.640487] init: log 0xb88035e0: setting log->remote_closed
[ 4.654816] init: log 0xb882ce88: clearing log->io
[ 4.655263] init: log 0xb882a1d0: setting log->remote_closed
[ 4.706758] init: log 0xb8831370: setting log->remote_closed
[ 6.714543] init: log 0xb882a600: clearing log->io
[ 6.733596] init: log 0xb882c6a0: clearing log->io
[ 6.746124] init: log 0xb882cbd8: setting log->remote_closed
[ 6.765075] init: log 0xb882b1d0: clearing log->io
[ 6.787772] init: log 0xb8842b50: clearing log->io
[ 6.801163] init: log 0xb882cb78: clearing log->io
[ 6.810717] init: log 0xb882df58: clearing log->io
[ 6.864869] init: log 0xb883e418: clearing log->io
[ 6.871033] init: log 0xb882d2a8: clearing log->io
[ 6.899113] init: log 0xb882d268: clearing log->io
[ 6.927477] init: log 0xb884eb28: clearing log->io
[ 6.942432] init: log 0xb88519a8: clearing log->io
[ 6.997231] init: log 0xb8850c40: clearing log->io
[ 7.031685] init: log 0xb8850650: clearing log->io
[ 7.042562] init: log 0xb8856d10: clearing log->io
[ 7.042811] init: log 0xb8857cf8: clearing log->io
[ 7.063461] init: log 0xb885a850: setting log->remote_closed
[ 7.063996] init: log 0xb883e478: setting log->remote_closed
[ 7.095505] init: log 0xb8854960: setting log->remote_closed
[ 7.135581] init: log 0xb883e478: checking log->io
[ 7.135756] init: log.c:825: Assertion failed in log_clear_unflushed: ! log->io
[ 7.140537] init: Caught abort, core dumped
[ 7.143211] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

So the assertion has been proved for the log at 0xb883e478 which has the field remote_closed updated but the field io dirty.

Related branches

Revision history for this message
regmka (regmka) wrote :
regmka (regmka)
affects: ubuntu → upstart (Ubuntu)
regmka (regmka)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
Revision history for this message
James Hunt (jamesodhunt) wrote :

@regmka: Thanks very much for both reporting this issue and identifying the issue! :-)

Changed in upstart:
status: New → In Progress
assignee: nobody → James Hunt (jamesodhunt)
Revision history for this message
Alexander Kjäll (capitol) wrote :

I'm also affected by this bug on an virtual machine.

Workaround was to add --no-log to the kernel command line.

Screenshot of kernel crach attached.

James Hunt (jamesodhunt)
Changed in upstart:
status: In Progress → Fix Released
Changed in upstart (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
pva (pva) wrote :

Sorry, I don't understand. Is this supposed to be fixed in 13.04 server? I've just installed ubuntu-server 13.04 and upgraded all packages but still have this error on boot...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I believe this is fixed in 13.10 only.

Revision history for this message
Timocheck (timocheck) wrote :

Encounter same issue in 14.04

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.