in-target: mkinitramfs: failed to determine device for /

Bug #1582899 reported by TJ on 2016-05-17
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
base-installer (Ubuntu)
Undecided
Unassigned
initramfs-tools (Ubuntu)
Undecided
Unassigned
live-installer (Ubuntu)
Undecided
Unassigned

Bug Description

Sysadmin reported in #ubuntu (later #ubuntu-kernel) the 16.04 ubuntu-server ISO installer failed due to being unable to configure linux-image-4.4.0-21-generic.

Lots of diagnostics and one SSH remote session later we seem to have narrowed it down to the installer.

At the installer's boot menu the F6 option "Expert mode" is chosen.

During initial ram file-system creation (after the kernel image is installed) the /dev/ file-system is not mounted in /target/ and therefore
the initramfs-tools/hook-functions::dep_add_modules_mount() cannot match
the mount device of "/" (in this case /dev/sda3) with any node under /dev/ which only contains static entries.

Cause appears to be that live-installer.postinst has the crucial step calling library.sh:setup_dev() commented out:

#waypoint 1 setup_dev

OS=linux
setup_dev() calls setup_dev_${OS}
setup_dev_linux() mounts procfs and devtmpfs into /target/

----

Originally the cause of the error message appeared to be that the symlink names in /dev/disk/by-uuid/ haven't been updated after the partitioning stage if there were pre-existing partitions and file-systems on the install device, *and* the sysadmin chose to format the existing partitions when selecting mountpoints.

In this case a hardware RAID device presents:

/dev/sda1 (/boot/)
/dev/sda2 (swap)
/dev/sda3 (/)

From the shell I noticed:

root@tmpstorage:/# ll /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 10 May 17 19:39 130e4419-4bfd-46d2-87f9-62e5379bf591 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 17 19:39 127d3fa1-c07c-48e4-9e26-1b926d37625c -> ../../sda3
lrwxrwxrwx 1 root root 10 May 17 19:39 78b88456-2b0b-4265-9ed2-5db61522d887 -> ../../sda2
lrwxrwxrwx 1 root root 9 May 17 19:39 2016-04-20-22-45-29-00 -> ../../sr1
drwxr-xr-x 6 root root 120 May 17 19:39 ..
drwxr-xr-x 2 root root 120 May 17 19:39 .

root@tmpstorage:/# blkid /dev/sda*
/dev/sda: PTUUID="a84e60fd" PTTYPE="dos"
/dev/sda1: UUID="61365714-8ff7-47a2-8035-8aed9e3191a6" TYPE="ext4" PARTUUID="a84e60fd-01"
/dev/sda2: UUID="78b88456-2b0b-4265-9ed2-5db61522d887" TYPE="swap" PARTUUID="a84e60fd-02"
/dev/sda3: UUID="75f68451-9472-47c7-9efc-ed032bfa9987" TYPE="ext4" PARTUUID="a84e60fd-03"

More details to follow.

TJ (tj) wrote :
TJ (tj) wrote :
TJ (tj) wrote :
TJ (tj) wrote :

Using a virtual machine I'm able to reproduce the differing UUIDs but not able to reproduce the failure of the initrd generation.

The code that fires the error is /usr/share/initramfs-tools/hook-functions::dep_add_modules_mount()

TJ (tj) wrote :

I've now successfully reproduced the issue in the virtual machine.

Firstly, at the installer boot menu, choose Expert Mode.

This allows selection of each installer step from a menu and causes debconf to ask many more questions.

At "Install the System" choose a "normal" installation and then for "Drivers to include in the initrd:" choose "targeted: only include drivers needed for this system".

syslog shows:

May 17 22:55:29 base-installer: info: Using kernel 'linux-generic'
May 17 22:55:29 base-installer: warning: Failed to get debconf answer 'base-installer/kernel/linux/initrd'.
May 17 22:55:29 base-installer: info: Setting do_initrd='yes'.
May 17 22:55:29 base-installer: info: Setting link_in_boot='no'.
May 17 22:55:29 in-target: Reading package lists...
...
May 17 22:56:21 in-target: Running depmod.^M
May 17 22:56:22 in-target: update-initramfs: deferring update (hook will be called later)^M
May 17 22:56:22 in-target: Examining /etc/kernel/postinst.d.^M
May 17 22:56:22 in-target: run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-21-generic /boot/vmlinuz-4.4.0-21-generic^M
May 17 22:56:22 in-target: run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-21-generic /boot/vmlinuz-4.4.0-21-generic^M
May 17 22:56:22 in-target: update-initramfs: Generating /boot/initrd.img-4.4.0-21-generic^M
May 17 22:56:22 in-target: mkinitramfs: failed to determine device for /^M
May 17 22:56:22 in-target: mkinitramfs: workaround is MODULES=most, check:^M
May 17 22:56:22 in-target: grep -r MODULES /etc/initramfs-tools/^M
May 17 22:56:22 in-target: ^M
May 17 22:56:22 in-target: Error please report bug on initramfs-tools
May 17 22:56:22 in-target: ^M
May 17 22:56:22 in-target: Include the output of 'mount' and 'cat /proc/mounts'^M

TJ (tj) wrote :

This is triggered by the debconf setting in base-installer's library.sh:install_kernel_linux():

Where it writes out the alternate module policy for initramfs-tools:

root@tmpstorage:/# cat /etc/initramfs-tools/conf.d/driver-policy
+ cat /etc/initramfs-tools/conf.d/driver-policy
# Driver inclusion policy selected during installation
# Note: this setting overrides the value set in the file
# /etc/initramfs-tools/initramfs.conf
MODULES=dep

TJ (tj) wrote :

I added "set -x" to the start of /usr/share/initramfs-tools/hook-functions::dep_add_modules_mount() and after it failed I grabbed the output from /var/log/apt/term.log:

The link /initrd.img is a dangling linkto /boot/initrd.img-4.4.0-21-generic
vmlinuz(/boot/vmlinuz-4.4.0-21-generic
) points to /boot/vmlinuz-4.4.0-21-generic
 (/boot/vmlinuz-4.4.0-21-generic) -- doing nothing at /var/lib/dpkg/info/linux-image-4.4.0-21-generic.postinst line 491.
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-21-generic /boot/vmlinuz-4.4.0-21-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-21-generic /boot/vmlinuz-4.4.0-21-generic
update-initramfs: Generating /boot/initrd.img-4.4.0-21-generic
+ local dir dev_node FSTYPE
+ local modules=
+ dir=/
+ [ ! -d /sys/devices/ ]
+ read dev mp fs opts rest
+ [ / = / ]
+ [ ext4 != rootfs ]
+ printf dev_node=/dev/sda3\nFSTYPE=ext4
+ break
+ eval dev_node=/dev/sda3
FSTYPE=ext4
+ dev_node=/dev/sda3
+ FSTYPE=ext4
+ [ / != / ]
+ [ ext4 = ubifs ]
+ [ / = / ]
+ [ /dev/sda3 = /dev/root ]
+ [ -z /dev/sda3 ]
+ readlink -f /dev/sda3
+ dev_node=/dev/sda3
+ [ -b /dev/sda3 ]
+ echo mkinitramfs: failed to determine device for /
mkinitramfs: failed to determine device for /
+ echo mkinitramfs: workaround is MODULES=most, check:
mkinitramfs: workaround is MODULES=most, check:
+ echo grep -r MODULES /etc/initramfs-tools/
grep -r MODULES /etc/initramfs-tools/
+ echo

+ echo Error please report bug on initramfs-tools
Error please report bug on initramfs-tools
+ echo Include the output of 'mount' and 'cat /proc/mounts'
Include the output of 'mount' and 'cat /proc/mounts'
+ exit 1
update-initramfs: failed for /boot/initrd.img-4.4.0-21-generic with 1.
run-parts: /etc/kernel/postinst.d/initramfs-tools exited with return code 1
Failed to process /etc/kernel/postinst.d at /var/lib/dpkg/info/linux-image-4.4.0-21-generic.postinst line 1052.

The bug looks to be a missing $ symbol to denote the dev_node variable in the second test:

    # recheck device
    if [ -z "$dev_node" ] || ! dev_node="$(readlink -f ${dev_node})" \
        || ! [ -b "$dev_node" ]; then

! dev_node="$(readlink -f ${dev_node})"

should, I think, be:

! $dev_node="$(readlink -f ${dev_node})"

TJ (tj) wrote :

I mis-read the code in the previous comment; the condition is fine. To figure it out I added some lines that do each test independently so as to identify which of the three fails. That reveals the problem is the block-device test:

Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-21-generic /boot/vmlinuz-4.4.0-21-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-21-generic /boot/vmlinuz-4.4.0-21-generic
update-initramfs: Generating /boot/initrd.img-4.4.0-21-generic
+ local dir dev_node FSTYPE
+ local modules=
+ dir=/
+ [ ! -d /sys/devices/ ]
+ read dev mp fs opts rest
+ [ / = / ]
+ [ ext4 != rootfs ]
+ printf dev_node=/dev/sda3\nFSTYPE=ext4
+ break
+ eval dev_node=/dev/sda3
FSTYPE=ext4
+ dev_node=/dev/sda3
+ FSTYPE=ext4
+ [ / != / ]
+ [ ext4 = ubifs ]
+ [ / = / ]
+ [ /dev/sda3 = /dev/root ]
+ [ -z /dev/sda3 ]
+ echo good dev_node
good dev_node
+ readlink -f /dev/sda3
+ dev_node2=/dev/sda3
+ echo readlink returned 0 dev_node2=/dev/sda3
readlink returned 0 dev_node2=/dev/sda3
+ [ -b /dev/sda3 ]
+ echo not a block device: /dev/sda3
not a block device: /dev/sda3
+ [ -z /dev/sda3 ]
+ readlink -f /dev/sda3
+ dev_node=/dev/sda3
+ [ -b /dev/sda3 ]
+ echo mkinitramfs: failed to determine device for /
mkinitramfs: failed to determine device for /
+ echo mkinitramfs: workaround is MODULES=most, check:
mkinitramfs: workaround is MODULES=most, check:
+ echo grep -r MODULES /etc/initramfs-tools/
grep -r MODULES /etc/initramfs-tools/
+ echo

+ echo Error please report bug on initramfs-tools

The problem is the /dev/sda* nodes do not exist in the /target, presumably because the host hasn't mounted devtmpfs to /target/dev.

That should be done by a call to hook-functions::setup_dev_linux() before dep_add_modules_mount()

There's a waypoint for it in debian/bootstrap-base.postinst before the install_kernel stage.

TJ (tj) wrote :

In expert mode the kernel command-line carries "priority=low" which is a debconf configuration parameter. It suggests that it affects the order of functions called such that "setup_dev" isn't being called before "install_kernel", which would make sense because in tests after a manual "update-initramfs -c -t ..." it was not possible to reproduce the issue with "dpkg-reconfigure linux-image-4.4.0-21-generic".

TJ (tj) wrote :

After noticing /var/lib/dpkg/info/live-installer.postinst and seeing it has the crucial step commented out I think the source of the issue is now identified.

live-installer debian/live-installer.postinst:

waypoint 1 check_target
waypoint 1 get_mirror_info
waypoint 100 install_live_system
waypoint 1 pre_install_hooks
#waypoint 1 setup_dev
waypoint 1 configure_apt_preferences
waypoint 1 configure_apt
waypoint 3 apt_update
waypoint 5 post_install_hooks
waypoint 1 pick_kernel_if_missing
waypoint 20 install_kernel

The question is, is live-installer in control here, or base-installer? The syslog messages indicate base-installer but there's no sign of its postinst script in the running installer.

TJ (tj) wrote :

After ensuring the "normal" option (not "live") was selecting in the "Install the System" step I checked which processes were running when it paused to ask which kernel should be installed.

This confirms the live-installer.postinst script is running:

$ grep -Ev '\]$' ~/installer-ps.log
  PID USER VSZ STAT COMMAND
    1 root 4696 S /bin/busybox init
  117 root 16648 S /lib/systemd/systemd-udevd --daemon --resolve-names=never
  161 root 4700 S /sbin/syslogd -m 0 -O /var/log/syslog -S
  163 root 4696 S /sbin/klogd -c 2
  222 root 4696 S {debian-installe} /bin/sh /sbin/debian-installer
  224 root 4696 S -/bin/sh
  225 root 4696 S /bin/busybox init
  226 root 4696 S /usr/bin/tail -f /var/log/syslog
  248 root 8696 S /usr/bin/bterm -f /lib/unifont.bgf -l C.UTF-8 /lib/debian-installer/menu
  249 root 56140 S debconf -o d-i /usr/bin/main-menu
  255 root 8660 S /usr/bin/main-menu
19469 root 6604 S udpkg --configure --force-configure live-installer
19470 root 4828 S {live-installer.} /bin/sh /var/lib/dpkg/info/live-installer.postinst con
20828 root 4700 R ps

Should 'normal' have caused base-installer to be installed into the live host before initiating the target install?

TJ (tj) on 2016-05-18
description: updated
TJ (tj) on 2016-05-18
description: updated
Adam Conrad (adconrad) on 2016-05-18
Changed in initramfs-tools (Ubuntu):
status: New → Invalid
Changed in base-installer (Ubuntu):
status: New → Invalid
Nish Aravamudan (nacc) on 2016-05-19
tags: added: ubuntu-server
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in live-installer (Ubuntu):
status: New → Confirmed
Casey Stone (tcstone) wrote :

nicely researched and documented bug. Why not fixed three years later? I hit this (or seeminly this same bug) doing an install of server 18.04.2 yesterday. I needed to enable ssh for remote install and lowered debconf priority. This gave me some good clues I'll now try some work-arounds.

tags: added: server-triage-discuss
tags: removed: server-triage-discuss ubuntu-server
Andreas Hasenack (ahasenack) wrote :

This bug has a lot of troubleshooting comments, but lacks a good summary about what is going on. I don't expect the xenial installer to be changed anymore, short of critical bugs affecting it.

@tcstone, you say you hit something similar with 18.04.2. Current bionic release is 18.04.3, would you mind to try again, and if you still hit a bug, file a new one please? Please note that the default server install in bionic is subiquity ("live-server"), and bugs against that should be filed via https://bugs.launchpad.net/subiquity/+filebug.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers