AIO-DX subcloud BnR: Restore playbook failed to connect to host via ssh due to double boot post host-unlock

Bug #2064297 reported by Joshua Kraitberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
New
Undecided
Unassigned

Bug Description

Brief Description
-----------------
It was noticed that on some systems after unlocking during a restore a second reboot would occur. This is not a crash but an intentional reboot triggered by puppet because the kernel args do not match what the system expects them to be.

Severity
--------
Major

Steps to Reproduce
------------------
Perform backup and restore on a system with non-default kernel args

Expected Behavior
------------------
One reboot after unlock

Actual Behavior
----------------
Two reboots after unlock

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
June 13, 2023 master

Last Pass
---------
Never

Timestamp/Logs
--------------
2023-06-13T01:08:38.741 Notice: 2023-06-13 01:08:36 +0000 Scope(Class[Platform::Compute::Grub::Audit]): Kernel Boot Argument Mismatch
2023-06-13T01:08:38.744 Notice: 2023-06-13 01:08:36 +0000 Scope(Class[Platform::Compute::Grub::Recovery]): Update Grub and Reboot
2023-06-13T01:08:38.746 Notice: 2023-06-13 01:08:36 +0000 Scope(Class[Platform::Compute::Grub::Update]): Updating grub configuration
...
2023-06-13T01:10:53.972 Debug: 2023-06-13 01:10:53 +0000 /Stage[post]/Platform::Compute::Grub::Update/Exec[Add the cpu arguments to /boot/efi/EFI/BOOT/boot.env]: The container Class[Platform::Compute::Grub::Update] will propagate my refresh event
2023-06-13T01:10:53.974 Debug: 2023-06-13 01:10:53 +0000 Class[Platform::Compute::Grub::Update]: The container Stage[post] will propagate my refresh event
2023-06-13T01:10:53.976 Debug: 2023-06-13 01:10:53 +0000 Exec[reboot-recovery](provider=posix): Executing 'reboot'
2023-06-13T01:10:53.979 Debug: 2023-06-13 01:10:53 +0000 Executing: 'reboot'

Test Activity
-------------
Developer Testing

Workaround
----------
Wait for second reboot

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/917718
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/e31a01f201451adcf85b503fdfc86238102edfe3
Submitter: "Zuul (22348)"
Branch: master

commit e31a01f201451adcf85b503fdfc86238102edfe3
Author: Joshua Kraitberg <email address hidden>
Date: Mon Apr 29 15:38:18 2024 -0400

    fix: Restore boot config during legacy restore

    Boot configs were not being restored. Causing a double reboot after
    unlocking when puppet updated boot config.

    This is identical in purpose to https://review.opendev.org/c/starlingx/ansible-playbooks/+/886007.

    This issue was not noticed during legacy restore because only subclouds
    have boot options that can trigger a situation like this.

    Normal stand-alone systems and system controllers do not appear to
    suffer any issue.

    TEST PLAN
    PASS: AIO-DX subcloud backup and restore
      * New backup will include /boot files
      * No double reboot
      * Non-default kernel boot args will be kept
      * /proc/cmdline can be used to verify kernel boot args
    PASS: AIO-DX subcloud backup and restore
      * Remove new /boot files from backup
      * Restore with modified backup
      * Non-default kernel boot args will be lost
      * No double reboot
      * /proc/cmdline can be used to verify kernel boot args
    PASS: No regression on unaffected AIO-DX system controller

    Close-Bug: 2064297
    Change-Id: Ibf7ac5b4c0998b15d788488798223ee6f2966e95
    Signed-off-by: Joshua Kraitberg <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.