RVMC timeout for power on/off step is too short

Bug #2038484 reported by Tee Ngo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Li Zhu

Bug Description

Brief Description
-----------------
Some ZT servers can take minutes to complete the immediate shutdown and/or power on steps. The current rvmc timeout for these steps is too short (60 polls with 1 second sleep in between). Furthermore, the error log is also misleading. It should log the requested state as opposed to the current state.

Severity
--------
Major

Steps to Reproduce
------------------
Deploy a new Redfish capable subcloud or perform an orchestrated upgrade of a subcloud which involves remote install.

Expected Behavior
------------------
Deployment/upgrade completes successfully

Actual Behavior
----------------
Subcloud remote install fails intermittently on certain ZT servers due to timeout on power on/off.

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
Sept 30, 2023 master build

Last Pass
---------
Never did. The remote install test would have failed before if the ZT server was in a state that exhibits a delayed shut down or power on.

Timestamp/Logs
--------------
To be provided

Test Activity
-------------
Other - normal usage

Workaround
----------
None

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/896657
Committed: https://opendev.org/starlingx/metal/commit/bfbaba57310a03e60f30d44b6dc127de5c77f960
Submitter: "Zuul (22348)"
Branch: master

commit bfbaba57310a03e60f30d44b6dc127de5c77f960
Author: Li Zhu <email address hidden>
Date: Mon Sep 25 14:04:21 2023 -0400

    Set longer shutdown time and fix power state error log

    1.Extended the timeout to 14mins to accommodate the longer shutdown time.
    2.Fixed the power state error log so that it logs the requested state
    instead of the current power_state.

    Test Plan:

    PASS: Verify logged version is 2.2
    PASS: Verify success path with no FIT delay ; HP and ZT servers
    PASS: Verify timing of the loop with timeout of 14 minutes
    PASS: Verify shutdown timeout handling when shutdown exceeds 14
          minutes.
    PASS: Verify install completes successfully when Power Off takes
          close to but less than 14 minutes
    PASS: Verify power state failure log reports proper state

    Closes-Bug: 2038484

    Signed-off-by: Li Zhu <email address hidden>
    Change-Id: Ic99a06dca9962fcae43b20e00d8ebcb127a80560

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.9.0 stx.metal
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Li Zhu (lzhu1)
Revision history for this message
Li Zhu (lzhu1) wrote :

New tag: rvmc:stx.8.0-v1.0.2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/898042

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/898042
Committed: https://opendev.org/starlingx/distcloud/commit/593b1373dd032515fe46f6cf4fdf22faf5ff4f8f
Submitter: "Zuul (22348)"
Branch: master

commit 593b1373dd032515fe46f6cf4fdf22faf5ff4f8f
Author: Li Zhu <email address hidden>
Date: Wed Oct 11 20:29:42 2023 -0400

    Update rvmc image tag to stx.8.0-v1.0.2

    Test Plan:
    PASS: Pull new image from docker registry.
    PASS: Install subcloud with new rvmc image.

    Closes-bug: 2038484

    Depends-On:
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/898041

    Change-Id: Ief00673d8f2a2155648680c2e9992ab5f521200a
    Signed-off-by: Li Zhu <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/898041
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/7af8f6581f763042fc93fbd3a9070ebec28da975
Submitter: "Zuul (22348)"
Branch: master

commit 7af8f6581f763042fc93fbd3a9070ebec28da975
Author: Li Zhu <email address hidden>
Date: Wed Oct 11 20:24:29 2023 -0400

    Update rvmc image tag to stx.8.0-v1.0.2

    Test Plan:
    PASS: Pull new image from docker registry.
    PASS: Install subcloud with new rvmc image.

    Closes-bug: 2038484

    Change-Id: I0b70656903236d75f32e891d24f5ef0a34d24e26
    Signed-off-by: Li Zhu <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/898678
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/0516ea6c3efee36b69eb24347ba8047fac3f4735
Submitter: "Zuul (22348)"
Branch: master

commit 0516ea6c3efee36b69eb24347ba8047fac3f4735
Author: Li Zhu <email address hidden>
Date: Tue Oct 17 14:10:04 2023 -0400

    Fix RVMC timeout logging issue

    No Ansible playbook logs are generated when RVMC execution times out
    due to the missing timeout condition in the 'apply-rvmc-job' role.

    Test Plan:
    PASS: Verify the RVMC logs are generated in the ansible log file when
          RVMC execution times out.

    Closes-bug: 2038484

    Change-Id: I2b9398c116729e5cddc55d9222e78e8d61efc0b3
    Signed-off-by: lzhu1 <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.