LAT sometimes runs rollback image

Bug #1992994 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Eric MacDonald

Bug Description

Brief Description
-----------------
LAT implements an algorithm where it toggles between running the normal and rollback images every 4 reboots if the Debian watchdog service is disabled.

Severity
--------
Critical: Affects host recovery over multiple reboots

Steps to Reproduce
------------------
systemctl stop watchdog
systemctl disable watchdog
reboot mode 4 times

Expected Behavior
------------------
normal image is always automatically run

Actual Behavior
----------------
rollback image gets run for 4 boots in a row after 4 boots in a row

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
October, 2022

Last Pass
---------
Unknown, but suspect prior to the watchdog being disabled

Timestamp/Logs
--------------
none logged

Test Activity
-------------
Sanity

Workaround
----------
Running 'sudo /usr/sbin/clearbootflag.sh" will reset the boot_tried_count to 0 in /boot/efi/EFI/BOOT/boot.env so you can run this before any reboot or lock/unlock and you can avoid the issue

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/metal/+/861461

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/861461
Committed: https://opendev.org/starlingx/metal/commit/385372fecd8c35ac2a44fbe98a903130fb53f783
Submitter: "Zuul (22348)"
Branch: master

commit 385372fecd8c35ac2a44fbe98a903130fb53f783
Author: Eric MacDonald <email address hidden>
Date: Fri Oct 14 18:10:36 2022 +0000

    Debian: Disable LAT's automatic rollback image selection

    LAT implements an algorithm where it toggles between running
    the 'normal' and 'rollback' images every 4 reboots if the Debian
    watchdog service is disabled or is not being serviced.

    If the watchdog service is disabled so should this automatic
    image selection service.

    Until then, this update adds code to the starlingX kickstarts
    to effectively disable LAT's automatic image toggle feature.

    Test Plan:
    PASS: Verify new kickstart log
    PASS: Verify grub file change on system
    PASS: Verify Debian Image build and Install
    PASS: Verify normal installed image is always selected
          over 10+ reboots

    Closes-Bug: 1992994
    Author: Robert Church <email address hidden>
    Signed-off-by: Eric MacDonald <email address hidden>
    Change-Id: I2abee365bf6ebce4aac781e1f563e21a62b2a49d

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
assignee: nobody → Eric MacDonald (rocksolidmtce)
tags: added: stx.8.0 stx.debian stx.metal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.