UEFI Debian RT worker nodes fail to boot post-install

Bug #1990895 reported by Bob Church
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Bob Church

Bug Description

Brief Description
-----------------
Computes are stuck in PXE reboot loop, because they can't boot from disk.

Severity
--------
Critical

Steps to Reproduce
------------------
Reinstall UEFI H/W lab, configuring a low-latency worker

Expected Behavior
-----------------
Computes install successfully

Actual Behavior
---------------
workers are in a reboot loop.

Reproducibility
---------------
100%

System Configuration
--------------------
Standard IPv6 lowlatency.

Load info
---------
Any recent master branch build

Last Pass
---------
??

Timestamp/Logs
--------------
compute-0 reboots after kickstarts finish, but fails booting from disk, so it will PXEboot.

Booting from starlingx
Boot Failed: starlingx
Booting from CentOS
Boot Failed: CentOS
Booting from PXE Device 1: Integrated NIC 1 Port 1 Partition 1
>>Start PXE over IPv4.
  Station IP address is 192.168.202.242
  Server IP address is 192.168.202.1
  NBP filename is EFI/BOOT/bootx64-nosig.efi
  NBP filesize is 765952 Bytes
 Downloading NBP file...
  Succeed to download NBP file.
Welcome to GRUB!
error: no such device: ((tftp,192.168.202.1)EFI/BOOT)/EFI/BOOT/grub.cfg.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/metal/+/859347

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/859347
Committed: https://opendev.org/starlingx/metal/commit/0b614831c84768013645d07151619a94487f343c
Submitter: "Zuul (22348)"
Branch: master

commit 0b614831c84768013645d07151619a94487f343c
Author: Robert Church <email address hidden>
Date: Mon Sep 26 01:52:13 2022 -0500

    Debian: UEFI RT pxeboot installs need efi=runtime option

    Add the efi=runtime kernel command line option for all lowlatency UEFI
    installs so that the efibootmgr has access to EFI non-volatile
    variables. LAT will run the efibootmgr as one of that last steps of the
    ISO/pxeboot install to change the device boot order of the host.

    When the PREEMPT_RT kernel is enable this is required otherwise we will
    rely on the existing boot order of the system, often resulting in a
    pxeboot loop.

    As reference here is the difference in the std and rt kernel requiring
    this change:

        $ diff linux-yocto-{std,rt}/drivers/firmware/efi/efi.c
        69c69
        < static bool disable_runtime;
        ---
        > static bool disable_runtime = IS_ENABLED(CONFIG_PREEMPT_RT);
        98a99,101
        >
        > if (parse_option_str(str, "runtime"))
        > disable_runtime = false;

    Test Plan:
    PASS - Added parameter on a UEFI RT worker install in a H/W lab and
           observe that the previously failing post install reboot now
           boots successfully.

    Signed-off-by: Robert Church <email address hidden>
    Closes-Bug: #1990895
    Change-Id: Iff41e1125bb5a6e27f7de92862da1ef4899de794

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.debian
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.