'secondary' and 'rehome-pending' subclouds stuck at 'online'

Bug #2047439 reported by Gustavo Herzmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Gustavo Herzmann

Bug Description

Brief Description
-----------------
In the original design, 'secondary' and 'rehome-pending' subclouds are not supposed to be audited, this creates the issue where the subclouds get stuck with the 'online' availability status, preventing the user from being able to delete it.

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
With an online subcloud, run 'dcmanager subcloud unmanage --migrate <subcloud_ref>', turn off subcloud
and verify that it's still 'online'. Try to delete it and verify that it's not possible because it's still 'online'.

Expected Behavior
------------------
Subcloud should eventually become 'offline'. The user should be able to delete it then.

Actual Behavior
----------------
Subcloud get stuck with the 'online' availability status, user is unable to delete it

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Distributed Cloud system

Branch/Pull Time/Commit
-----------------------
Master (2023-12-26)

Last Pass
---------
Issue introduced during development of: https://storyboard.openstack.org/#!/story/2010852

Timestamp/Logs
--------------
2023-12-13 18:03:08.313 845417 DEBUG dcmanager.audit.subcloud_audit_worker_manager [req-08f49e3d-cb03-4161-9688-5c76ec778374 - - - - -] PID: 845417, starting audit of subcloud: subcloud4. audit_subclouds /usr/lib/python3/dist-packages/dcmanager/audit/subcloud_audit_worker_manager.py:109 2023-12-13 18:03:08.313 845417 DEBUG dcmanager.audit.subcloud_audit_worker_manager [req-08f49e3d-cb03-4161-9688-5c76ec778374 - - - - -] Skip subcloud subcloud4 audit, deploy_status: rehome-pending audit_subclouds /usr/lib/python3/dist-packages/dcmanager/audit/subcloud_audit_worker_manager.py:133

Test Activity
-------------
Feature Testing

Workaround
----------
NA

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/904354

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/904354
Committed: https://opendev.org/starlingx/distcloud/commit/244df2ed78ee4f0b3cd5e21a1f04d623ba38dff1
Submitter: "Zuul (22348)"
Branch: master

commit 244df2ed78ee4f0b3cd5e21a1f04d623ba38dff1
Author: Gustavo Herzmann <email address hidden>
Date: Tue Dec 26 09:49:07 2023 -0300

    Fix 'secondary' and 'rehome-pending' subclouds stuck at 'online'

    This commit makes 'rehome-pending' subclouds auditable while they are
    still online (endpoint audits are still skipped, as the subcloud is
    unmanaged). Secondary subclouds are still not audited but their
    availability status will be automatically set to 'offline' when its
    deploy status is set to 'secondary'.

    In the original design, 'secondary' and 'rehome-pending' subclouds are
    not supposed to be audited, this creates the issue where the subclouds
    get stuck with the 'online' availability status, preventing the user
    from being able to delete it.

    This commit also fixes an issue where it was not possible to set the
    endpoint status to 'unknown' for 'secondary' subclouds.

    Test Plan:
    1. PASS - Run 'dcmanager subcloud unmanage --migrate' for an online
              subcloud and verify that:
              - All endpoint statuses were set to 'unknown';
              - Subcloud was still audited, but each endpoint audit was
                skipped;
              - After turning off the subcloud, its availability-status
                changed to 'offline' and the audits started being skipped.
    2. PASS - Manage back the rehome-pending subcloud, verifying that:
              - It initially becomes 'managed' while still 'offline';
              - Audit starts running again, eventually setting the
                availability-status back to 'online';
              - Endpoint statuses started becoming 'in-sync' again.
    3. PASS - Set the subcloud to 'rehome-pending', and then set it to
              'secondary', verify that the subcloud becomes 'offline' and
              that audits are skipped (all endpoint status should be set to
              'unknown').

    Closes-Bug: 2047439

    Change-Id: Ia21faf469aacee6f70e5b4fe6471b019ae057e13
    Signed-off-by: Gustavo Herzmann <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.distcloud
Changed in starlingx:
assignee: nobody → Gustavo Herzmann (gherzman)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.