Bug #1915573 “Curtin does not regenerate initramfs for CentOS” : Bugs : curtin

Revision history for this message

Sean Feole (sfeole) wrote on 2021-02-12:

#1

Thanks Lee, the description is correct. This issue can be easily reproduced by installing the MAAS Centos8 image provided in the latest streams to any piece of hardware that has a fancy disk , such as nvme or perhaps even a raid. You will find that the OS will hang shortly after mounting and booting from /sysroot.

There will be no errors/warnings etc.

The workaround was to install centos8 on the same hardware using the centos installer, copy the kernel modules to the mass installed disk image and regenerate the initram fs.

To reproduce this:
-- Deploy the maas centos8 image to a disk.
-- Upon boot the operating system will hang after mounting /sysroot with no apparent warnings or errors.

-- Upon restarting boot into single user mode ( append , `rd.break` kernel param upon booting.)
-- Remount sysroot rw, $mount -o remount,rw /sysroot
-- observe there are no logs or warnings anyhere on the disk. Most of the kernel modules installed are a minimal set, usually found in qemu. You will not have a wide variety of /lib/modules/kernel-<version>/net/*.ko drivers or drivers/platform/*.ko, usr/*.ko, etc.....

To fix this.

-- Install centos8 using there manual installer or netboot it via a mirror.
-- Ensure you install it to a NEW disk partition, do not erase the maas deployed image.
after the system comes up, copy the /lib/modules/<kernel> to /lib/modules on the maas deployed disk.

#mount the root disk
$mkdir /mnt/maasdisk
$mount </dev/device/partition> /mnt/maasdisk

#mount the efi partition
$mount </dev/device/parititon> /mnt/maasdisk/boot/efi/

#bind mount sys and dev
$mount --rbind /sys /mnt/maasdisk/sys
$mount --rbind /dev /mnt/maasdisk/dev
$mount -t proc /proc /mnt/maasdisk/proc

#for good measure ensure the kernels on the host machine and maas disk image are the same and copy the modules from the host machine to the maas disk image.
$mv /mnt/maasdisk/lib/modules/<kernel-modules> -> <kernel-modules.old>
$cp -R /lib/modules/* /mnt/maasdisk/lib/modules/

#chroot into maasdisk
$chroot /mnt/maasdisk

#rebuild initramfs
$dracut --regenerate-all --force

#verify the maas disk image now boots
Success!

-- copy the maas disk image /lib/modules and initrd to the custom maas image created via packer-maas.
$ tar -xvf maas.tar.img

# replace the initrd and /lib/modules with the ones used to repair maas disk image, tar the conents back and upload to maas.

$tar -Sczpf $OUTPUT --selinux -C <maas_img_dir> .
$ maas $PROFILE boot-resources create name='centos/8-custom' title='CentOS 8 Cus
tom' architecture='amd64/generic' filetype='tgz' content@=centos8.tar.gz

Thanks Lee, the description is correct. This issue can be easily reproduced by installing the MAAS Centos8 image provided in the latest streams to any piece of hardware that has a fancy disk , such as nvme or perhaps even a raid.  You will find that the OS will hang shortly after mounting and booting from /sysroot.

There will be no errors/warnings etc.

The workaround was to install centos8 on the same hardware using the centos installer, copy the kernel modules to the mass installed disk image and regenerate the initram fs.

To reproduce this:
-- Deploy the maas centos8 image to a disk.
-- Upon boot the operating system will hang after mounting /sysroot with no apparent warnings or errors.

-- Upon restarting boot into single user mode  ( append , `rd.break` kernel param upon booting.)  
-- Remount sysroot rw,   $mount -o remount,rw /sysroot
-- observe there are no logs or warnings anyhere on the disk. Most of the kernel modules installed are a minimal set, usually found in qemu. You will not have a wide variety of /lib/modules/kernel-<version>/net/*.ko drivers or drivers/platform/*.ko, usr/*.ko, etc.....

To fix this.

-- Install centos8 using there manual installer or netboot it via a mirror.
-- Ensure you install it to a NEW disk partition, do not erase the maas deployed image. 
after the system comes up, copy the /lib/modules/<kernel>  to /lib/modules on the maas deployed disk.

#mount the root disk
$mkdir /mnt/maasdisk
$mount </dev/device/partition> /mnt/maasdisk

#mount the efi partition
$mount </dev/device/parititon> /mnt/maasdisk/boot/efi/

#bind mount sys and dev 
$mount --rbind /sys /mnt/maasdisk/sys
$mount --rbind /dev /mnt/maasdisk/dev
$mount -t proc /proc /mnt/maasdisk/proc

#for good measure ensure the kernels on the host machine and maas disk image are the same and copy the modules from the host machine to the maas disk image.
$mv /mnt/maasdisk/lib/modules/<kernel-modules>  -> <kernel-modules.old> 
$cp -R /lib/modules/* /mnt/maasdisk/lib/modules/

#chroot into maasdisk
$chroot /mnt/maasdisk

#rebuild initramfs
$dracut --regenerate-all --force

#verify the maas disk image now boots
Success!

-- copy the maas disk image /lib/modules and initrd to the custom maas image created via packer-maas. 
$ tar -xvf maas.tar.img

# replace the initrd and /lib/modules with the ones used to repair maas disk image, tar the conents back and upload to maas.

$tar -Sczpf $OUTPUT --selinux -C <maas_img_dir> .
$ maas $PROFILE boot-resources create name='centos/8-custom' title='CentOS 8 Cus
tom' architecture='amd64/generic' filetype='tgz' content@=centos8.tar.gz

Revision history for this message

Ryan Harper (raharper) wrote on 2021-02-13:

#2

> Since the CentOS image is generated in KVM the initramfs has minimal kernel modules.

Isn't this a problem in the generated initramfs? Why doesn't the Centos8 image for MAAS include more hardware modules like the Ubuntu initramfs does?

> Curtin should rebuild CentOS's initramfs on deployment.

We do regenerate initramfs for centos; it's triggered for lvm or raid modules being required. Is there a specific dracut module that needs enabled for nvme?

Changed in curtin:
status:	New → Incomplete

Revision history for this message

Lee Trager (ltrager) wrote on 2021-02-16:

#3

The Ubuntu kernel/initrd in the stream is only used for booting. During deployment a kernel is installed from the archive and the initrd is generated for the target system. The initrd generated for the target system only has the required kernel modules for that system, not all kernel modules.

gh:canonical/packer-maas has to include a kernel in the CentOS image which generates an initrd at image build time. There isn't a way with kickstart to include additional modules without permanently including in the initrd. We also don't have a good way of know which kernel modules should be included as I have access to a limited test environment. This environment doesn't have NVME hardware in it so while I know some kernel modules are missing for NVME I don't know which ones are missing. There may be other storage drivers missing that I'm unaware of because I don't have access to test hardware.

Why not always regenerate the initrd on CentOS/RHEL installation? This should leverage the scripts in CentOS to pull in the appropriate drivers automatically.

Changed in curtin:
status:	Incomplete → New

Revision history for this message

Ryan Harper (raharper) wrote on 2021-02-16:

#4

> There isn't a way with kickstart to include additional modules without permanently including in
the initrd.

curtin's in the same boat. It does not know any more about what kernel modules should be included in the target other than virtual storage related items (like raid or lvm) which are not hardware/platform dependent.

IIUC, because MAAS/curtin deploys on Ubuntu kernel/initramfs but the target is centos; the centos initrd is "not as complete" hardware module wise as the Ubuntu version. Would it not be better to generate the initramfs for centos in the same way we do for physical hardware on Ubuntu?

> Is there a specific dracut module that needs enabled for nvme?

This didn't get answered.

> This should leverage the scripts in CentOS to pull in the appropriate drivers automatically.

This may or may not work; since the NVME devices are present due to the Ubuntu kernel; the code to see what drivers/moduled are in-use for a given block device may not suggest to include a module if it's built in.

> $dracut --regenerate-all --force

This is not "leverage the scripts in CentOS to pull in appropriate drivers"; this is pull everything in.

I'd like discussion on:

1) should the MAAS created CentOS initramfs contain the same driver support (compiled or module) as the Ubuntu kernel/initramfs pair do already ?

2) Does dracut detect the needed kernel modules to put in the initramfs when run under Ubuntu kernel/initrd? If so, do we need any special flags to trigger the detection

Changed in curtin:
status:	New → Incomplete

Revision history for this message

Sean Feole (sfeole) wrote on 2021-02-17:

#5

I can provide access to the test hardware for you guys to work on if so desired. Please let me know.

Revision history for this message

Sean Feole (sfeole) wrote on 2021-02-17:

#6

I'll also see if i can get you a list of installed modules on a working target system so we can compare against the standard list of preinstalled modules

Revision history for this message

Ryan Harper (raharper) wrote on 2021-02-17:

#7

@Sean

Thanks for the offer of hardware and the modules on the working system.

If you have a working system running Centos8 on your target hardware, I would be interested in
output from:

1) dracut --verbose /tmp/initrd.img `uname -r`
2) lsinitrd /tmp/initrd.img
3) lsinitrd <the initramfs you've created with the command in this bug>
4) Does the initrd created in (1) include the required drivers for booting correctly?

Revision history for this message

Ryan Harper (raharper) wrote on 2021-02-17:

#8

Download full text (4.1 KiB)

> > $dracut --regenerate-all --force
>
> This is not "leverage the scripts in CentOS to pull in
> appropriate drivers";

None of the dracut versions I've found on Centos8 include the
--regenerate-all flag ... what version of dracut is this?

[root@c8-vm ~]# dracut --regenrate-all /tmp/initrd2.img `uname -r`
getopt: unrecognized option '--regenrate-all'
Usage: /usr/bin/dracut [OPTION]... [<initramfs> [<kernel-version>]]

Version: 049-70.git20200228.el8

Creates initial ramdisk images for preloading modules

-h, --help Display all options

If a [LIST] has multiple arguments, then you have to put these in quotes.

For example:

# dracut --add-drivers "module1 module2" ...

> this is pull everything in.

I'm not sure about this, I could be wrong. There *is* an option which does
pull in most/all modules:

  -H, --hostonly Host-Only mode: Install only what is needed for
                        booting the local host instead of a generic host.
  -N, --no-hostonly Disables Host-Only mode
  --hostonly-mode <mode>
                        Specify the hostonly mode to use. <mode> could be
                        one of "sloppy" or "strict". "sloppy" mode is used
                        by default.
                        In "sloppy" hostonly mode, extra drivers and modules
                        will be installed, so minor hardware change won't make
                        the image unbootable (eg. changed keyboard), and the
                        image is still portable among similar hosts.
                        With "strict" mode enabled, anything not necessary
                        for booting the local host in its current state will
                        not be included, and modules may do some extra job
                        to save more space. Minor change of hardware or
                        environment could make the image unbootable.
                        DO NOT use "strict" mode unless you know what you
                        are doing.

Depending on the discussion I've raised on whether the MAAS
produced Centos8 initramfs should closer match what the Ubuntu
initrd has (it's built for booting on hardware) the use
--no-hostonly should produce a larger, more complete initramfs
which would likely include storage modules like NVME.)

On a centos8 vm with virtio disks (no nvme), I created initramfs
3 ways:

  --hostonly --hostonly-mode=strict
  --hostonly --hostonly-mode=sloppy
  --no-hostonly

[root@c8-vm ~]# ls -ahl /tmp/initrd*
-rw------- 1 root root 17M Feb 17 19:24 /tmp/initrd-host-sloppy.img
-rw------- 1 root root 17M Feb 17 19:23 /tmp/initrd-host-strict.img
-rw------- 1 root root 28M Feb 17 19:22 /tmp/initrd-nohost.img

On this setup, there's no practical difference between
strict/sloppy; and neither pull in the NVME module as it's not in
use. But the nohots only does

root@c8-vm ~]# lsinitrd /tmp/initrd-nohost.img | grep nvme | awk '{print $9}'
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme-core.ko.xz
usr/l...

> > $dracut --regenerate-all --force
>
> This is not "leverage the scripts in CentOS to pull in
> appropriate drivers";

None of the dracut versions I've found on Centos8 include the
--regenerate-all flag ... what version of dracut is this?

[root@c8-vm ~]# dracut --regenrate-all /tmp/initrd2.img `uname -r`
getopt: unrecognized option '--regenrate-all'
Usage: /usr/bin/dracut [OPTION]... [<initramfs> [<kernel-version>]]

Version: 049-70.git20200228.el8

Creates initial ramdisk images for preloading modules

-h, --help  Display all options

If a [LIST] has multiple arguments, then you have to put these in quotes.

For example:

# dracut --add-drivers "module1 module2"  ...

> this is pull everything in.

I'm not sure about this, I could be wrong.  There *is* an option which does
pull in most/all modules:

-H, --hostonly        Host-Only mode: Install only what is needed for
                        booting the local host instead of a generic host.
  -N, --no-hostonly     Disables Host-Only mode
  --hostonly-mode <mode>
                        Specify the hostonly mode to use. <mode> could be
                        one of "sloppy" or "strict". "sloppy" mode is used
                        by default.
                        In "sloppy" hostonly mode, extra drivers and modules
                        will be installed, so minor hardware change won't make
                        the image unbootable (eg. changed keyboard), and the
                        image is still portable among similar hosts.
                        With "strict" mode enabled, anything not necessary
                        for booting the local host in its current state will
                        not be included, and modules may do some extra job
                        to save more space. Minor change of hardware or
                        environment could make the image unbootable.
                        DO NOT use "strict" mode unless you know what you
                        are doing.

Depending on the discussion I've raised on whether the MAAS
produced Centos8 initramfs should closer match what the Ubuntu
initrd has (it's built for booting on hardware) the use
--no-hostonly should produce a larger, more complete initramfs
which would likely include storage modules like NVME.)

On a centos8 vm with virtio disks (no nvme), I created initramfs
3 ways:

--hostonly --hostonly-mode=strict
  --hostonly --hostonly-mode=sloppy
  --no-hostonly

[root@c8-vm ~]# ls -ahl /tmp/initrd*
-rw------- 1 root root 17M Feb 17 19:24 /tmp/initrd-host-sloppy.img
-rw------- 1 root root 17M Feb 17 19:23 /tmp/initrd-host-strict.img
-rw------- 1 root root 28M Feb 17 19:22 /tmp/initrd-nohost.img

On this setup, there's no practical difference between
strict/sloppy; and neither pull in the NVME module as it's not in
use.  But the nohots only does

root@c8-vm ~]# lsinitrd /tmp/initrd-nohost.img | grep nvme | awk '{print $9}'
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme-core.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme-fabrics.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme-fc.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme-rdma.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme-tcp.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/host/nvme.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/target
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/target/nvme-loop.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/target/nvmet-fc.ko.xz
usr/lib/modules/4.18.0-193.19.1.el8_2.centos.plus.x86_64/kernel/drivers/nvme/target/nvmet.ko.xz

The last test that'd be interesting is running dracut --hostonly
mode while booted into Ubuntu but chrooted into centos8
filesystem (which recreates what
curtin does/could do).

Revision history for this message

Lee Trager (ltrager) wrote on 2021-02-18:

#9

@raharper - It looks like Dracut has support to autodetect which kernel modules are loaded and automatically adds them to the initrd[1]. While there may be some differences in what CentOS and Ubuntu compile into the kernel there will be more similarities than differences. The reason why I didn't answer which kernel modules should be added is because I don't know. This isn't just a problem with NVME but other storage types as well. We currently have a bug in packer-maas[2] where a user claims that the CentOS initrd is missing kernel modules but they don't know which.

To answer your questions:
1. I don't really like the idea of blowing up the initrd just in case. Ideally it would be rebuilt to include just the kernel modules needed for that hardware. MAAS's goal is to get as close as possible to CentOS's official installer which works this way. If that is our only option, I'd like to see if we can make the initrd in the Canonical provided image contain the drivers but not modify dracut.conf.
2. I think we just need to trigger the rebuild with `dracut --force /boot/initramfs-$KVER $KVER`

@sfeole - thanks for your info and letting us poke around at the hardware! Could you try adding

rebuild_initrd: ['curtin', 'in-target', '--', 'dracut', '--force', '/boot/initramfs-4.18.0-240.10.1.el8_3.x86_64.img', '4.18.0-240.10.1.el8_3.x86_64']

to curtin_userdata_centos on each controller and attempt to redeploy CentOS 8?

[1] https://github.com/dracutdevs/dracut/blob/master/modules.d/90kernel-modules/module-setup.sh#L35
[2] https://github.com/canonical/packer-maas/issues/22

Lee Trager (ltrager) on 2021-02-19

Changed in curtin:
status:	Incomplete → New

curtin

Curtin does not regenerate initramfs for CentOS

Bug Description

Other bug subscribers

Remote bug watches