When the secondary site goes down, the PGA status in the primary site should be in unknown status

Bug #2055030 reported by Jon Zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jon Zhang

Bug Description

Brief Description

When the primary site goes down then PGA status in the secondary site should be in out-of-sync status

Severity

Major

Steps to Reproduce

1) Create the system peer from the primary site

2) Create the system peer from the secondary site.

3) Create the SPG in the primary site

4) Add the subclouds to the SPG

5) Create the Peer-group association b/w system peer and subcloud peer group and status should be in-sync status

6) Automatically the SPG and PGA will be created in the secondary site with in-sync status

7) Power-off the secondary site.

8) Check the PGA status in the Primary site
[sysadmin@controller-0 dcmanager(keystone_admin)]$ dcmanager peer-group-association list
--{}-----------{}+{}------------{}{}---------{}{}---------{}{}-------------------

id peer_group_id system_peer_id type sync_status peer_group_priority
--{}-----------{}+{}------------{}{}---------{}{}---------{}{}-------------------

 5 5 1 non-primary in-sync None
--{}-----------{}+{}------------{}{}---------{}{}---------{}{}-------------------
[sysadmin@controller-0 dcmanager(keystone_admin)]$

Expected Behavior

PGA status should be in unknown status post primary site unreachable.

Actual Behavior

PGA status should be in in-sync status post primary site unreachable.

Reproducibility

Yes

System Configuration

Primary site - 2620:10a:a001:d41::1180

Secondary site - 2620:10a:a001:d41::1080

Load info (eg: 2022-03-10_20-00-07)

Last Pass

Timestamp/Logs

Alarms

Test Activity

Feature Testing

Workaround

NA

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/910150

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/910150
Committed: https://opendev.org/starlingx/distcloud/commit/26bb7011e4e51f5b5ba728d08b214a197a50d583
Submitter: "Zuul (22348)"
Branch: master

commit 26bb7011e4e51f5b5ba728d08b214a197a50d583
Author: Zhang Rong(Jon) <email address hidden>
Date: Mon Feb 26 16:57:27 2024 +0800

    Fix issues with PGA sync_status

    This commit addresses the issue where the primary site's PGA status
    remains 'in-sync' even after the secondary site becomes unreachable.
    With this fix, the PGA status will be updated to 'unknown' upon the
    secondary site's failure. Additionally, the status will transition to
    'in-sync' once the secondary site is operational again.
    If there are any changes in the association while the secondary site is
    down, the PGA status will be set to failed. The sync status will
    transition to "out-of-sync" upon secondary site recovery.

    In this commit, the audit thread in the primary site will also update
    PGA sync_status. If the primary site is down and the SPG is migrated to
    secondary site, upon primary site recovery, its audit thread will update
    the PGA sync_status on both sites accordingly.

    Finally, the commit prevents the peergroup to from being updated in the
    secondary site.

    Test Case:
    1. PASS - Shutdown of site2 (secondary site) results in the
            synchronization status of the peer group association
            transitioning from 'in-sync' to 'unknown'.
    2. PASS - Restoration of site2 (secondary site) leads to the
            synchronization status of the peer group association on
            the primary site changing to 'in-sync', and the peer
            group association status on site2 also reflects 'in-sync'.
    3. PASS - While secondary is is offline, execute some operations which
            result in PGA sync_status being set to "failed". Recover
            secondary site and verify that the PGA sync_status is set to
            out-of-sync on both sites.
    4. PASS - Verify that updating peer group on secondary site is
            disallowed.
    5. PASS - Shut down the primary site, migrate the SPG to secondary site.
            Restore the primary site while migration is in progress. Verify
            that the PGA sync_status is out-of-sync. Verify that PGA
            sync_status is set to in-sync shortly after the migration is
            complete.

    Closes-Bug: 2055030

    Change-Id: I67f4200118621205c539b24eb764e3cc5acf12c0
    Signed-off-by: Zhang Rong(Jon) <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
summary: - When the secondary site goes down then PGA status in the primary site
- should be in unknow status
+ When the secondary site goes down, the PGA status in the primary site
+ should be in unknown status
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.distcloud
tags: added: stx.10.0
removed: stx.9.0
Changed in starlingx:
assignee: nobody → Jon Zhang (jonzhang)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.