run-init: nuking initramfs contents: directory not empty

Bug #613273 reported by Scott Moser on 2010-08-04
82
This bug affects 13 people
Affects Status Importance Assigned to Milestone
initramfs-tools (Ubuntu)
Medium
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
linux (Ubuntu)
Medium
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
udev (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Medium
Unassigned
Natty
Medium
Unassigned

Bug Description

In testing alpha-3 build 20100813.2, I found the following kernel oops:
[ 0.473264] Kernel panic - not syncing: Attempted to kill init!^M
[ 0.473277] Pid: 1, comm: run-init Not tainted 2.6.35-14-virtual #19-Ubuntu^M
[ 0.473284] Call Trace:^M
[ 0.473294] [<ffffffff815a0109>] panic+0x90/0x111^M
[ 0.473303] [<ffffffff81006b1b>] ? __raw_callee_save_xen_irq_enable+0x11/0x26^M
[ 0.473312] [<ffffffff8106276d>] forget_original_parent+0x33d/0x350^M
[ 0.473318] [<ffffffff81061b94>] ? put_files_struct+0xc4/0xf0^M
[ 0.473325] [<ffffffff8106279b>] exit_notify+0x1b/0x190^M
[ 0.473330] [<ffffffff810640a5>] do_exit+0x1c5/0x3f0^M
[ 0.473336] [<ffffffff81006afd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e^M
[ 0.473343] [<ffffffff810643d7>] sys_exit+0x17/0x20^M
[ 0.473349] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b^M

This happened only once in my testing so far today (dozens of boots).
I'm attaching the console log of failed instance. The instance that reported this bug is same ami/region different instance.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-14-virtual 2.6.35-14.19
Regression: Yes
Reproducible: No
ProcVersionSignature: User Name 2.6.35-14.19-virtual 2.6.35
Uname: Linux 2.6.35-14-virtual x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg: [ 12.900041] eth0: no IPv6 routers present
Date: Wed Aug 4 01:19:11 2010
Ec2AMI: ami-09c3bc5b
Ec2AMIManifest: ubuntu-images-testing-ap-southeast-1/ubuntu-maverick-daily-amd64-server-20100803.2.manifest.xml
Ec2AvailabilityZone: ap-southeast-1b
Ec2InstanceType: m1.large
Ec2Kernel: aki-11d5aa43
Ec2Ramdisk: unavailable
Frequency: Once a week.
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcCmdLine: root=LABEL=uec-rootfs ro
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcModules: acpiphp 18752 0 - Live 0xffffffffa0000000
SourcePackage: linux

Scott Moser (smoser) wrote :
tags: added: iso-testing
Scott Moser (smoser) wrote :

I saw this again today on ubuntu-maverick-daily-amd64-server-20100914 instance size m1.large with kernel
$ dpkg-query --show linux-image-2.6.35-20-virtual
linux-image-2.6.35-20-virtual 2.6.35-20.29

Changed in linux (Ubuntu):
status: New → Confirmed
Scott Moser (smoser) wrote :

I seem to be seeing this more and more. I've seen similar traces in the t1.micro logs without crash. Moving it to medium.

Changed in linux (Ubuntu):
importance: Undecided → Medium
Thierry Carrez (ttx) on 2010-09-21
tags: added: server-mro
Seth Forshee (sforshee) wrote :

The kernel panic is a result of init exiting, which likely isn't a kernel issue. From the logs it looks like run-init is bailing because it can't delete something in the initramfs when switching to the real root fs.

I think this is more likely an initramfs issue than a kernel issue. Adding initramfs-tools to the affected packages.

Jeremy Foshee (jeremyfoshee) wrote :

removed deprecated regression tag.

~JFo

tags: removed: regression-potential
tags: added: regression-release
Changed in initramfs-tools (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in initramfs-tools (Ubuntu):
importance: Undecided → Medium
Stefan Bader (smb) wrote :

This problem seemed to have intensified for me in my test system in Oneiric and we were finally able to track it down to /usr/share/initramfs-tools/scripts/init-bottom/udev. In there the boot process tries to stop udevd and then move all the special filesystems (/dev, /proc, and /sys) over to the new rootfs and finally switching to that before restarting udevd. However udevd is still launching processes to create devnodes at that point. And it seems in some rare cases the pkill (SIGTERM) fails to really kill all of the udevd processes, which leads to situation were the initramfs cannot be completely nuked and that triggers a panic.

In Oneiric udevadm has a way to stop udevd in a more sensible way which also waits until udev actually stopped (udevadm control --exit). Thought this is not possible with the versions of udev in Natty and Maverick. Making udevd at least not starting new processes (udevadm control --stop-exec-queue). Using that before the pkill would prevent a lot of those ugly "workers have been killed" and "/dev/null not found" messages. Unfortunately there still seemed to be a (much smaller) chance to hit the problem where udevd does not stop on SIGTERM.

So I am not sure which path is the better / simpler to implement one. Have the ability of using a --exit backported from newer udev packages or possibly retry the pkill a few times and if that does not remove the udev processes, switch to a more brutal signal before finally giving up... But either way it is not a kernel problem but udev or initramfs-tools side.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Stefan Bader (smb) wrote :

This has been fixed for Oneiric by udev-171-0ubuntu4.

Changed in udev (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Maverick):
status: New → Invalid
Changed in linux (Ubuntu Natty):
status: New → Invalid
Changed in udev (Ubuntu Maverick):
importance: Undecided → Medium
status: New → Triaged
Changed in udev (Ubuntu Natty):
importance: Undecided → Medium
status: New → Triaged
Stefan Bader (smb) wrote :

Though it is part of /usr/share/initramfs-tools, the scripts/init-bottom/udev is part of the udev package.

Changed in initramfs-tools (Ubuntu):
status: Triaged → Invalid
Changed in initramfs-tools (Ubuntu Maverick):
status: New → Invalid
Changed in initramfs-tools (Ubuntu Natty):
status: New → Invalid
Mika Båtsman (mika-batsman) wrote :

@Stefan: this also affects Lucid. Is it going to get the fix also?

Bugs 573615, 581566 and 155689 also describe similar symptoms.

We sporadically also see similar symptoms to this bug and #610107. I think they're also connected and both caused by killing udev in the middle of its operation. The huge number of similar bugs makes me think this is not the best idea even when it gives us 2 or 3 seconds less of boot time.

Wouldn't simply adding "/sbin/udevadm settle" before killing udev solve this issue? We're currently starting long-term tests to see whether it helps.

Bernie Innocenti (codewiz) wrote :

On my laptop, it's now reproducible on *every* boot!

If I start with break=bottom, manually move all the mountpoints to /root and then call run-init by hand, I get the same error:

 run-init: nuking initramfs contents: directory not empty

The remaining files are: /dev/console, /var/initramfs, plus various things in /var/run/udev

This is Oneiric with udev_173-0ubuntu4 and initramfs-tools_0.99ubuntu8. The systesm was upgraded from lucid

Igor Galić (i.galic) wrote :

I upgraded my system from natty yesterday in hope it would fix a hang during boot:

    [ 2.108903] md: multipath personality registered for level -4
    done.
    Begin: Running /scripts/init-premount ... done.
    Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
    Begin: Running /scripts/local-premount ... done.
    [ 7.137918] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
    Begin: Running /scripts/local-bottom ... done.
    done.
    Begin: Running /scripts/init-bottom ... done.
    init: ureadahead main process (299) terminated with status 5

instead of I now get a couple of steps further and then the system kernel panics with this exact message. Every time.

Igor Galić (i.galic) wrote :

I added udevadm settle before the line udevadm control --timeout=61 --exit. Nothing changes, still crashing:

    [ 1.172059] Refined TSC clocksource calibration: 3411.530 MHz.
    [ 1.172703] md: multipath personality registered for level -4
    done.
    Begin: Running /scripts/init-premount ... done.
    Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
    Begin: Running /scripts/local-premount ... done.
    [ 6.212696] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
    Begin: Running /scripts/local-bottom ... done.
    done.
    Begin: Running /scripts/init-bottom ... done.
    run-init: nuking initramfs contents: Directory not empty
    [ 6.334837] Kernel panic - not syncing: Attempted to kill init!
    [ 6.336789] Pid: 1, comm: run-init Not tainted 3.0.0-12-server #20-Ubuntu
    [ 6.338883] Call Trace:
    [ 6.339675] [<ffffffff815e8184>] panic+0x91/0x194
    [ 6.340808] [<ffffffff810626c5>] forget_original_parent+0x245/0x250
    [ 6.341658] [<ffffffff810626e7>] exit_notify+0x17/0x150
    [ 6.342374] [<ffffffff81062feb>] do_exit+0x1fb/0x440
    [ 6.343051] [<ffffffff81063387>] sys_exit+0x17/0x20
    [ 6.343716] [<ffffffff81606c02>] system_call_fastpath+0x16/0x1b

Igor Galić (i.galic) wrote :

There was a concern that udev is still running at this point, so I added a `ps` right after `udevadm control --timeout=61 --exit` and the result is this:
    Begin: Running /scripts/local-bottom ... done.
    done.
    Begin: Running /scripts/init-bottom ... PID USER VSZ STAT COMMAND
        1 0 4460 S /bin/sh /init ro
        2 0 0 SW [kthreadd]
        3 0 0 SW [ksoftirqd/0]
        4 0 0 SW [kworker/0:0]
        5 0 0 SW [kworker/u:0]
        6 0 0 SW [migration/0]
        7 0 0 SW [migration/1]
        8 0 0 SW [kworker/1:0]
        9 0 0 SW [ksoftirqd/1]
       10 0 0 SW [kworker/0:1]
       11 0 0 SW< [cpuset]
       12 0 0 SW< [khelper]
       13 0 0 SW< [netns]
       14 0 0 SW [sync_supers]
       15 0 0 SW [kworker/u:1]
       16 0 0 SW [bdi-default]
       17 0 0 SW< [kintegrityd]
       18 0 0 SW< [kblockd]
       19 0 0 SW< [ata_sff]
       20 0 0 SW [khubd]
       21 0 0 SW< [md]
       22 0 0 SW [kworker/1:1]
       24 0 0 SW [khungtaskd]
       25 0 0 SW [kswapd0]
       26 0 0 SWN [ksmd]
       27 0 0 SWN [khugepaged]
       28 0 0 DW [fsnotify_mark]
       29 0 0 SW [ecryptfs-kthrea]
       30 0 0 SW< [crypto]
       38 0 0 SW< [kthrotld]
       39 0 0 SW [scsi_eh_0]
       40 0 0 SW [scsi_eh_1]
       41 0 0 SW [kworker/u:2]
       42 0 0 SW [kworker/u:3]
      298 0 0 SW [jbd2/vda-8]
      299 0 0 SW< [ext4-dio-unwrit]
      303 0 4456 S /bin/sh -e /scripts/init-bottom/udev
      305 0 4460 R ps
    done.
    run-init: nuking initramfs contents: Directory not empty
    [ 6.397128] Kernel panic - not syncing: Attempted to kill init!
    [ 6.397849] Pid: 1, comm: run-init Not tainted 3.0.0-12-server #20-Ubuntu
    [ 6.398715] Call Trace:
    [ 6.399018] [<ffffffff815e8184>] panic+0x91/0x194
    [ 6.399609] [<ffffffff810626c5>] forget_original_parent+0x245/0x250
    [ 6.400424] [<ffffffff810626e7>] exit_notify+0x17/0x150
    [ 6.401102] [<ffffffff81062feb>] do_exit+0x1fb/0x440
    [ 6.401660] [<ffffffff81063387>] sys_exit+0x17/0x20
    [ 6.402255] [<ffffffff81606c02>] system_call_fastpath+0x16/0x1b

Igor Galić (i.galic) wrote :

I, or rather smb :) was wondering what could still be mounted that would prevent an overmount, so I added another debug output, that of `mount` right after `mount -n -o move /dev ${rootmnt}/dev`:

    Begin: Running /scripts/init-bottom ... rootfs on / type rootfs (rw)
    sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
    proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
    udev on /root/dev type devtmpfs (rw,relatime,size=2019088k,nr_inodes=504772,mode=755)
    devpts on /root/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
    tmpfs on /run type tmpfs (rw,nosuid,relatime,size=811332k,mode=755)
    fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
    /dev/vda on /root type ext4 (ro,relatime,user_xattr,acl,barrier=1,data=ordered)
    done.
    run-init: nuking initramfs contents: Directory not empty
    [ 6.357963] Kernel panic - not syncing: Attempted to kill init!
    [ 6.359092] Pid: 1, comm: run-init Not tainted 3.0.0-12-server #20-Ubuntu
    [ 6.360413] Call Trace:
    [ 6.360893] [<ffffffff815e8184>] panic+0x91/0x194
    [ 6.361913] [<ffffffff810626c5>] forget_original_parent+0x245/0x250
    [ 6.363086] [<ffffffff810626e7>] exit_notify+0x17/0x150
    [ 6.364123] [<ffffffff81062feb>] do_exit+0x1fb/0x440
    [ 6.365066] [<ffffffff81063387>] sys_exit+0x17/0x20
    [ 6.366001] [<ffffffff81606c02>] system_call_fastpath+0x16/0x1b

Stefan Bader (smb) wrote :

At this point there seem to be /proc, /sys and /run still under /. For /proc and /sys, there is code to move the mount to /root in /usr/share/initramfs-tools/init. This has also the chain call into run-init which seems to do the final move...

Bernie Innocenti (codewiz) wrote :

Someone with the necessary privileges should update the summary: this bug really doesn't have anything to do with the ec2 kernel, nor with system_call_fastpath.

"run-init: nuking initramfs contents: directory not empty" seems more appropriate.

Daniel Ellis (danellisuk) wrote :

I agree Bernie, so I changed the title. I don't believe you need particular privileges to do this.

summary: - kernel panic on ec2 in system_call_fastpath
+ run-init: nuking initramfs contents: directory not empty
Daniel Ellis (danellisuk) wrote :

Please can someone register this as also affecting "udev (Ubuntu Lucid)". I have tried, but cannot find a method to do this.

Paolo Maero (fabrica64) wrote :

Please, can someone register this as also affecting "udev (Ubuntu Lucid)"?

This kind of errors on an LTS version should be fixed, I think. Affecting on random systems, but when it happens I get it on 40% of reboots...

Scott Moser (smoser) on 2012-02-21
Changed in initramfs-tools (Ubuntu Lucid):
status: New → Invalid
Changed in linux (Ubuntu Lucid):
status: New → Invalid
Mekk (marcin-kasperski) wrote :

I got this very message, for the first time, after upgrade to 12.04 Precise. Can't boot that machine. Root on md

Mekk (marcin-kasperski) wrote :

My case resolved: somehow during upgrade I got /run as symlink to /var/run and /var/run as symlink to /run. After I patched this, the problem is gone. But it took time to spot it.

Regarding bug title: I find this error message to be not too informative… Providing some detail about the problem (which directory is not empty and what it contains, for example) woudl make resolving problems much easier.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in udev (Ubuntu Lucid):
status: New → Confirmed
Pontiy_Pilat (p-p) wrote :

Thank you, Mekk!
Reproduce this trouble after upgrade from lucid to precise.
Remove simlink /run and make dir /run in guest - fix problem.
rm /run
mkdir /run
Please fix it.

Jack Yan (jackyan) wrote :

I have this issue as well with the latest update (referred here from https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/155689):

making initramfs contents: Directory not empty
Kernel panic—not syncing: attempted to kill init

I’m really inexperienced on Ubuntu 10—just wanted to note this here in case it helps anyone.

As already mentioned in comment 10, we experienced the same bug at the end of 2011 and were hit by it in internal tests every few weeks.

After applying the mentioned change, it disappeared. Since somewhen in 2012, we're in the field with about hundred systems running Lucid with this patch and never ever saw the problem again. So probably applying the change to Lucid-updates would make sense?

To make it easy for you, I also added our patch now.

dino99 (9d9) wrote :
Changed in udev (Ubuntu Natty):
status: Triaged → Invalid
Changed in udev (Ubuntu Maverick):
status: Triaged → Invalid
dino99 (9d9) wrote :

upstream should have been synced

Changed in udev (Ubuntu Lucid):
status: Confirmed → Fix Released

I have been having this problem (intermittently) with Ubuntu Server 15.04 and now with 15.10 on my Intel NUC5CPYH for the last 3 months (since I bought it in fact). It only seems to happen when rebooting (never from a cold boot), and it doesn't always happen. I've attached an image of the console output. As my machine is normally headless it is particularly frustrating!

I use LVM but not for the root/system partition, only for /home. My system/root is at /dev/sda1 and formatted with ext4.

I am using latest kernels downloaded from Ubuntu's http://kernel.ubuntu.com/~kernel-ppa/mainline/ (currently 4.2.5) - but as discussed here it doesn't seem to be a kernel issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers