Master Node provisioning failing because nodes failing to boot and entering Emergency shell

Bug #1973038 reported by Sandeep Yadav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Description:

Master Node provisioning failing because nodes failing to boot and entering Emergency shell

https://logserver.rdoproject.org/37/28537/41/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/21b13d8/logs/baremetal_41_49511_0-console.log

~~~
[ 195.940081] dracut-initqueue[465]: [ -e "/dev/disk/by-uuid/cedbc2ac-f3e9-4607-a15e-5e5f3b7aa817" ]
[ 195.941507] dracut-initqueue[465]: fi"
[ 195.942238] dracut-initqueue[465]: Warning: dracut-initqueue: starting timeout scripts
[ 196.476978] dracut-initqueue[465]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[ 196.479069] dracut-initqueue[465]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2fcedbc2ac-f3e9-4607-a15e-5e5f3b7aa817.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[ 196.481780] dracut-initqueue[465]: [ -e "/dev/disk/by-uuid/cedbc2ac-f3e9-4607-a15e-5e5f3b7aa817" ]
[ 196.482963] dracut-initqueue[465]: fi"
[ 196.483602] dracut-initqueue[465]: Warning: dracut-initqueue: starting timeout scripts
[ 196.484529] dracut-initqueue[465]: Warning: Could not boot.
         Starting [0;1;39mDracut Emergency Shell[0m...
Warning: /dev/disk/by-uuid/cedbc2ac-f3e9-4607-a15e-5e5f3b7aa817 does not exist

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

Press Enter for maintenance
(or press Control-D to continue):
~~~

Another run example:

https://logserver.rdoproject.org/37/28537/40/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/ff50ffc/logs/baremetal_40_28810_0-console.log

So far issue seen in triple component but other components may be affected as well, we are waiting for other components run from today.

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

pinged Harald about this - He suspects if https://review.opendev.org/c/openstack/diskimage-builder/+/839830 will solve the current issue.

~~~
<ysandeep|rover>30 hjensas, stevebaker[m] fyi.. saw this today , https://logserver.rdoproject.org/37/28537/40/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/ff50ffc/logs/baremetal_40_28810_0-console.log kernel panic during boot - rerunning to confirm if its consistent.

<hjensas> ysandeep|rover: Looking at the dracut issue. I see the UUID it fails to find in the image build log. Wonder if https://review.opendev.org/c/openstack/diskimage-builder/+/839830 can be what is needed.

~~~

Running testproject job to confirm that: https://review.rdoproject.org/r/c/testproject/+/28537

Revision history for this message
Harald Jensås (harald-jensas) wrote :

The UUID that it fail to find is the UUID of the partition on the system that was used to build the image. Can see it in DIB log[1].

When I reproduce the image build locally I see a similar entry in DIB logs, and verified that the UUID matches the underlying systems UUID:

(undercloud) [cloud-user@undercloud images]$ ls -lha /dev/disk/by-uuid
total 0
drwxr-xr-x. 2 root root 60 May 11 10:51 .
drwxr-xr-x. 5 root root 100 May 11 10:51 ..
lrwxrwxrwx. 1 root root 10 May 10 18:04 10f3f76f-78f7-4536-9bf2-e29a6c349b14 -> ../../vda1
(undercloud) [cloud-user@undercloud images]$ grep 10f3f76f-78f7-4536-9bf2-e29a6c349b14 overcloud-hardened-uefi-full.log
2022-05-11 14:37:14.766 | root="UUID=10f3f76f-78f7-4536-9bf2-e29a6c349b14"

The version diskimage-builder-3.20.4-0.20220421140835.adc40db.el9.noarch in the job does not have[2], without this the /etc/machine-id does not have "uninitialized". Whithout it the 'reset-bls-entries' don't run?

[1] https://logserver.rdoproject.org/37/28537/40/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/ff50ffc/logs/undercloud/home/zuul/overcloud-hardened-uefi-full.log.txt.gz
[2] https://opendev.org/openstack/diskimage-builder/commit/147641fc3e11602cf9eaf723a2b38f60f394ac0b

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

It looks like the image was built without this change https://review.opendev.org/c/openstack/diskimage-builder/+/838792

This is confirmed in the log[1] from comment #2, 03-reset-bls-entries is running during pre-install, instead of post-install.

Looking at the most recent periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master[2], 03-reset-bls-entries is now running in post-install, and boot on the nodes is completing[3], so I think this issue has resolved.

[1] https://logserver.rdoproject.org/37/28537/40/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/ff50ffc/logs/undercloud/home/zuul/overcloud-hardened-uefi-full.log.txt.gz
[2] https://logserver.rdoproject.org/37/28537/42/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/a9751c2/logs/undercloud/home/zuul/overcloud-hardened-uefi-full.log.txt.gz
[3] https://logserver.rdoproject.org/37/28537/42/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-master/a9751c2/logs/baremetal_42_83182_0-console.log

Revision history for this message
Ronelle Landy (rlandy) wrote :

From Steve:

https://bugs.launchpad.net/tripleo/+bug/1973038 could also be mitigated by using the newest centos9 base image, so the kernel package doesn't get updated during image build

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

We are waiting for latest diskimage-builder to promote till current-tripleo to fix component lines.

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

This is fixed after component promotion and after we got the latest dib.

build history:-

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-baremetal-master

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.