cs9/master: unable to provision OC nodes

Bug #1957169 reported by Cédric Jeanneret
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Harald Jensås

Bug Description

Hello there,

I'm apparently unable to provision my OC nodes.

I'm using the tripleo-operator-ansible, and it generates this:
#!/bin/bash
# This file is managed by ansible
set -xeo pipefail

export PROVISION_OUTPUT=overcloud-baremetal-deployed-0.yaml
export PROVISION_STACK=overcloud-0
export PROVISION_USER=None
export PROVISION_KEY=None
export PROVISION_CONCURRENCY=None
export PROVISION_TIMEOUT=None
source /home/stack/stackrc; openstack overcloud node provision -o $PROVISION_OUTPUT --stack $PROVISION_STACK --network-ports --network-config ~/metalsmith-0.yaml >/home/stack/overcloud_node_provision.log 2>&1

The failure is as follow:

2022-01-12 06:15:42.593036 | 24420158-79d3-d0a5-fe6a-000000000018 | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Created port oc0-controller-0-ctlplane (UUID 138d72fe-dcac-4974-8271-f40d9d352d33) for node oc0-controller-0 (UUID 59539c62-a3ed-4bdb-aca6-62bc3967b9bc) with {'network_id': '5a4473c3-9e53-4909-adc5-47b963707754', 'name': 'oc0-controller-0-ctlplane'}\nAttached port oc0-controller-0-ctlplane (UUID 138d72fe-dcac-4974-8271-f40d9d352d33) to node oc0-controller-0 (UUID 59539c62-a3ed-4bdb-aca6-62bc3967b9bc)\nProvisioning started on node oc0-controller-0 (UUID 59539c62-a3ed-4bdb-aca6-62bc3967b9bc)\n", "msg": "Node 59539c62-a3ed-4bdb-aca6-62bc3967b9bc reached failure state \"deploy failed\"; the last error is None"}

Env information:
- using tripleo-ci-testing package repository
- using tripleo-ci-testing overcloud-full.tar (md5sum: 89266c1b4b0fddf1df8c86c988dfcb43
- using tripleo-ci-testing ironic-python-agent.tar (md5sum: 13abc673127487ae7ad54a938702fc54)

What I know:
- the PXE is kicking in as expected
- the env is 100% based on VM, using libvirt (tripleo-lab)
- the VMs are booting on UEFI

Capture from the VM console (virsh console <domain>)

iPXE 1.0.0+ (4bd064de) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP HTTPS iSCSI TFTP VLAN AoE EFI Menu

net0: 24:42:00:c7:80:fc using SNP on SNP-0000:01:00.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 24:42:00:c7:80:fc).................. ok
net0: 192.168.24.16/255.255.255.0 gw 192.168.24.1
net0: fe80::2642:ff:fec7:80fc/64
Next server: 192.168.24.1
Filename: http://192.168.24.1:8088/boot.ipxe
http://192.168.24.1:8088/boot.ipxe... ok
boot.ipxe : 758 bytes [script]
Attempting to boot from MAC 24-42-00-c7-80-fc
pxelinux.cfg/24-42-00-c7-80-fc... ok
http://192.168.24.1:8088/59539c62-a3ed-4bdb-aca6-62bc3967b9bc/deploy_kernel... ok
http://192.168.24.1:8088/59539c62-a3ed-4bdb-aca6-62bc3967b9bc/deploy_ramdisk... ok
EFI stub: Loaded initrd from command line option

Then the crash happens in ansible.

I have the env here, feel free to request an access!

Cheers,

C.

tags: added: alert promotion-blocker
Revision history for this message
Harald Jensås (harald-jensas) wrote :

In ironic-conductor.log I can see this error:

2022-01-17 05:52:38.176 2 ERROR ironic.drivers.modules.agent_base [req-ffbdafa8-aa43-4472-af95-b9515773227b - - - - -] Deploy step deploy.prepare_instance_boot failed: Failed to install a bootloader when deploying node 46a21833-e127-4a1e-a40b-7b2a8b6be414. Error: Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.
Command: chroot /tmp/tmpm6zdlj3w /bin/sh -c "grub2-install /dev/sda"
Exit code: 1
Stdout: ''
Stderr: "grub2-install: error: /usr/lib/grub/x86_64-efi/modinfo.sh doesn't exist. Please specify --target or --directory.\n".

Revision history for this message
Harald Jensås (harald-jensas) wrote :

last_error: 'Deploy step deploy.prepare_instance_boot failed: Failed to install a
  bootloader when deploying node 76016273-3d3f-4ec2-b13f-9c90feccbb90. Error: Installing
  GRUB2 boot loader to device /dev/sda failed with Unexpected error while running
  command.

  Command: chroot /tmp/tmp2jbb5r14 /bin/sh -c "grub2-install /dev/sda"

  Exit code: 1

  Stdout: ''''

  Stderr: "grub2-install: error: /usr/lib/grub/x86_64-efi/modinfo.sh doesn''t exist.
  Please specify --target or --directory.\n".'

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

As an aside, it looks like this happened when deploying an overcloud-full image. While this needs to be fixed this error would not have happened if overcloud-full-hardened-uefi.qcow2 was deployed instead, and this image will be the default going forward so testing and development effort really needs to be focused here.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

(I deleted comment #13, it wasn't accurate)

It looks like grub2-efi-x64-modules needs to be installed on the overcloud-full image, so the expected file is there when IPA chroots into it and calls grub2-install.

*However*, I discovered on Friday that the overcloud-full images don't include the bootloader element at all, which was surprising. I need to do some investigation to see what is the appropriate way forward here.

Revision history for this message
Harald Jensås (harald-jensas) wrote :

Is this happening because of this?

https://opendev.org/openstack/diskimage-builder/commit/fd63fe6999c70b60c559ee54a8d8478c9a1179b4

The package seems to be there now?

[CentOS-9 - stack@undercloud ~]$ sudo dnf info grub2-efi-x64-modules
Last metadata expiration check: 1:04:36 ago on Mon 17 Jan 2022 04:00:42 PM EST.
Available Packages
Name : grub2-efi-x64-modules
Epoch : 1
Version : 2.06
Release : 13.el9
Architecture : noarch
Size : 1.1 M
Source : grub2-2.06-13.el9.src.rpm
Repository : baseos
Summary : Modules used to build custom grub.efi images
URL : http://www.gnu.org/software/grub/
License : GPLv3+
Description :
             : The GRand Unified Bootloader (GRUB) is a highly configurable and
             : customizable bootloader with modular architecture. It supports a rich
             : variety of kernel formats, file systems, computer architectures and
             : hardware devices.
             :
             : This subpackage provides support for rebuilding your own grub.efi.

Revision history for this message
Harald Jensås (harald-jensas) wrote :

Actually inspecting a CentOS 8 Stream image build, grub2-efi-x64-modules.noarch is installed by the grub2 element.

I think we need to add grub2-efi-x64-modules here:
https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/grub2/pkg-map#L12

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Changed in tripleo:
assignee: nobody → Harald Jensås (harald-jensas)
status: Triaged → In Progress
Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.