Unable to determine the peer group sync state if one site is down in the middle of the sync

Bug #2046809 reported by Jon Zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jon Zhang

Bug Description

Brief Description

The following design flaw is uncovered during one of the error handling review sessions. See email attached for solution.

Severity

Major

Steps to Reproduce

During initial (or subsequent) sync, take down the peer site

Expected Behavior

User is able to determine the status of the sync: syncing, failed, in-sync, out-of-sync

Actual Behavior

sync_status field is used to determine whether that the site triggers syncing requests or not.

Reproducibility

100%

System Configuration

Load info (eg: 2022-03-10_20-00-07)

Last Pass

N/A

Timestamp/Logs

N/A

Alarms

N/A

Test Activity

Error Handling review

Workaround

None

Jon Zhang (jonzhang)
Changed in starlingx:
assignee: nobody → Jon Zhang (jonzhang)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/903928

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/903928
Committed: https://opendev.org/starlingx/distcloud/commit/e4227317606d476b1a005e8253f25a907e8486bf
Submitter: "Zuul (22348)"
Branch: master

commit e4227317606d476b1a005e8253f25a907e8486bf
Author: Zhang Rong(Jon) <email address hidden>
Date: Tue Dec 19 15:53:06 2023 +0800

    Fix unable to determine the SPG sync state if one site is down

    If Site1 (the local site) is down while setting up the protection
    group, the subcloud peer group sync state is unable to determine.
    This commit will automatically create the non-primary association on
    Site2 (the peer site) when creating a primary association, and
    update the sync state to the non-primary association. Then the
    operator can check the sync state on Site2 if Site1 is down.

    Test Plan:
    - PASS: Create a primary association and check the non-primary
            association on peer site. It was created, and sync_status
            will follow the primary association's sync_status.
    - PASS: Delete the primary association and check the non-primary
            association on peer site. It was deleted.
    - PASS: If you restart the "dcmanager-manager service" in the local
            site while the association sync_status is in "syncing", the
            sync_status will transition to "failed".
    - PASS: Create a primary association and wait for the sync_status
            change to "in-sync". Delete the subcloud peer group on peer
            site, the deletion will fail because it is associating to
            the non-primary association.

    Closes-Bug: 2046809

    Change-Id: Ia917d0dc7c65fbea1e222fb52dbec79fdbe65b65
    Signed-off-by: Zhang Rong(Jon) <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud-client (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud-client/+/904059
Committed: https://opendev.org/starlingx/distcloud-client/commit/56049f435adb53d27e3a5cfdea56c6c1ae8411c6
Submitter: "Zuul (22348)"
Branch: master

commit 56049f435adb53d27e3a5cfdea56c6c1ae8411c6
Author: Zhang Rong(Jon) <email address hidden>
Date: Wed Dec 20 10:23:04 2023 +0800

    Add association_type for association CLI list and detail

    Given that the non-primary association is automatically
    generated, it is necessary to introduce a type indicator to
    differentiate between primary and non-primary associations.
    This commit aims to incorporate the association type when
    listing and retrieving details of the association.

    Test Plan:
    - PASS: The command "dcmanager peer-group-association list"
            display the type information of the each association,
            indicating whether it is primary or non-primary.
    - PASS: The command "dcmanager peer-group-association show"
            will include the "association_type" attribute.
    - PASS: The association add/update/sync commands will include
            the association_type attribute.

    Closes-Bug: 2046809
    Depends-On: Ia917d0dc7c65fbea1e222fb52dbec79fdbe65b65

    Change-Id: I4d55700f85956c785760b2cc0a5e2ea13a180c22
    Signed-off-by: Zhang Rong(Jon) <email address hidden>

Ghada Khalil (gkhalil)
tags: added: stx.9.0 stx.distcloud
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.