GRUB UEFI watchdog timeout used during ISO installation insufficient for virtual media-based installation

Bug #2046182 reported by M. Vefa Bicakci
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
M. Vefa Bicakci

Bug Description

Brief Description
-----------------

The UEFI watchdog timeout (3 minutes) utilized by the installation ISO image's GRUB configuration file is insufficient for booting up from an ISO image placed on a web server and set up as a virtual medium via the BMC/iLO/iDRAC/platform firmware, even when the network throughput and latency are good.

Severity
--------
Minor (?): Prevents automated installation of StarlingX from virtual media set up via platform firmware.

Steps to Reproduce
------------------
* Upload a StarlingX installation ISO image onto a web server that is accessible by the iLO/platform firmware of an HPE DL360g10 server.
* Set up a virtual medium on the iLO web interface using an HTTP URL to point the iLO to the ISO image.
* Attempt to install StarlingX with this configuration.

Expected Behavior
------------------
Installation starts as expected.

Actual Behavior
----------------
The UEFI watchdog times out during the loading of the kernel/initramfs images, resulting in an undesired server reboot, so the installation fails to start.

Reproducibility
---------------
Reproducible every time.

System Configuration
--------------------
Please see the issue description. The critical aspect is the use of a virtual medium for installation.

Branch/Pull Time/Commit
-----------------------
This appears to be a day-1 issue with the Debian-based StarlingX distribution.

Last Pass
---------
CentOS-based StarlingX did not have this issue.

Timestamp/Logs
--------------
(None available.)

Test Activity
-------------
Normal use.

Workaround
----------

Removing the "efi-watchdog enable ..." lines from the GRUB configuration in the ISO image, manually, is a workaround.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tools (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/tools/+/903374

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tools (master)
Download full text (4.2 KiB)

Reviewed: https://review.opendev.org/c/starlingx/tools/+/903374
Committed: https://opendev.org/starlingx/tools/commit/00a5ccd35bdf40389878cb9b132bc4d2dc2683f1
Submitter: "Zuul (22348)"
Branch: master

commit 00a5ccd35bdf40389878cb9b132bc4d2dc2683f1
Author: M. Vefa Bicakci <email address hidden>
Date: Mon Dec 4 16:52:53 2023 +0000

    GRUB configuration: Increase UEFI watchdog timeout

    This commit increases the UEFI watchdog timeout utilized by GRUB in
    StarlingX from 3 minutes to 20 minutes to prevent undesirable and
    arguably premature UEFI watchdog timeout-triggered reboots during the
    installation of StarlingX ISO images via BMC/iLO/iDRAC/platform-provided
    virtual media redirection features in conjunction with ISO images hosted
    on web servers.

    In more detail, a user reported that a StarlingX-based distribution's
    ISO image would not successfully install with platform-provided ISO
    image redirection when the ISO image in question was hosted on a web
    server, despite the bandwidth and latency between the platform network
    interface and the web server being acceptable. The same user reported
    that removing the "efi-watchdog enable ..." line from the GRUB
    configuration resolved the issue.

    The same issue was later reproduced locally with an HPE DL360g10 server,
    where the OAM network interface was able to download an ISO image from a
    local server on a different subnet at a rate of about 76 MiB/s. (While
    the OAM and the iLO network interfaces are likely not the same, we do
    not envision the network conditions to be vastly different when the two
    network paths are compared.) In our reproduction of the issue, the
    downloading of the kernel and the initramfs images takes approximately
    nine minutes and ten seconds, after which the "Linux version" banner is
    printed out by the kernel on the serial console, regardless of whether
    the "Enhanced Download Performance" setting is enabled in the iLO
    settings or not.

    Based on these experimental results, this commit changes the UEFI
    watchdog timeout from 3 minutes to a duration that is approximately two
    times the initial kernel/initramfs load time of 9 minutes and 10 seconds
    encountered in our experiments: 20 minutes.

    Note that this commit does not affect the GRUB configuration files that
    are used after installation. The timeout remains 3 minutes in
    "/boot/efi/EFI/BOOT/grub.cfg" on installed systems after this commit,
    which is appropriate as the GRUB configuration file in question is
    utilized for booting up from local storage (i.e., SSD or HDD).

    Verification:

    * The reported issue was confirmed by placing a StarlingX-based
      distribution's nightly build ISO image on a web server, and the iLO
      (out-of-band platform management firmware) of the HPE DL360g10 server
      under test was configured to boot up from the ISO image on the web
      server via virtual media redirection using an HTTP URL. The 3 minute
      UEFI watchdog timeout set by GRUB was observed to be insufficient and
      the server was seen to autonomously reboot i...

Read more...

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
assignee: nobody → M. Vefa Bicakci (vbicakci)
tags: added: stx.9.0 stx.distro.other stx.tools
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.