IPv6 Distributed Cloud: subcloud identity sync_status is out-of-sync

Bug #1845701 reported by Peng Peng
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Andy

Bug Description

Brief Description
-----------------
DC system installed and subclouds is at in-sync status. After few hours, the status of one SX subcloud became "out-of-sync". Alarm shows, identity sync_status is out-of-sync.

Severity
--------
Major

Steps to Reproduce
------------------
Install DC system with one SX subcloud
leave it for 10 hours

TC-name: DC

Expected Behavior
------------------
subcloud keep in-sync

Actual Behavior
----------------
subcloud out-of-sync

Reproducibility
---------------
Seen once

System Configuration
--------------------
DC system
IPv6

Lab-name: DC

Branch/Pull Time/Commit
-----------------------
TC_19.10 master as of 2019-09-22_20-00-00

Last Pass
---------

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+---------+
| 2 | subcloud6 | managed | online | complete | in-sync |
| 4 | subcloud4 | managed | online | complete | in-sync |
+----+-----------+------------+--------------+---------------+---------+

 2019-09-26T17:53:48.635993

[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
+----------+-----------------------------------------------+------------------------------+----------+-------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-----------------------------------------------+------------------------------+----------+-------------------+
| 280.002 | subcloud6 identity sync_status is out-of-sync | subcloud=subcloud6.resource= | major | 2019-09-27T07:20: |
| | | identity | | 21.936996 |
| | | | | |
+----------+-----------------------------------------------+------------------------------+----------+-------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+-------------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+-------------+
| 2 | subcloud6 | managed | online | complete | out-of-sync |
| 4 | subcloud4 | managed | online | complete | in-sync |
| 8 | subcloud1 | unmanaged | offline | complete | unknown |
+----+-----------+------------+--------------+---------------+-------------+
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud show 2
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 2 |
| name | subcloud6 |
| description | None |
| location | None |
| software_version | 19.10 |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:5::0/64 |
| management_start_ip | fd01:5::2 |
| management_end_ip | fd01:5::11 |
| management_gateway_ip | fd01:5::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| created_at | 2019-09-25 18:08:18.803496 |
| updated_at | 2019-09-27 15:12:17.104477 |
| identity_sync_status | out-of-sync |
| patching_sync_status | in-sync |
| platform_sync_status | in-sync |
+-----------------------------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

alarm from subcloud6:
[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
+----------+-------------------------------------------------------------------------------------------------------+---------------------------------------+----------+-------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-------------------------------------------------------------------------------------------------------+---------------------------------------+----------+-------------------+
| 400.002 | Service group distributed-cloud-services loss of redundancy; expected 1 standby member but no standby | service_domain=controller. | major | 2019-09-27T15:11: |
| | members available | service_group=distributed-cloud- | | 50.184950 |
| | | services | | |
| | | | | |
| 800.011 | Loss of replication in replication group group-0: peer host down | cluster= | major | 2019-09-26T15:24: |
| | | 3a0bb9c0-4622-4da4-9d54-ab3c27b20438. | | 31.603640 |
| | | peergroup=group-0 | | |
| | | | | |
+----------+-------------------------------------------------------------------------------------------------------+---------------------------------------+----------+-------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

Test Activity
-------------
Regression Testing

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 - issue related to Distributed Cloud which is a deliverable for that release

Changed in starlingx:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Andy (andy.wrs)
tags: added: stx.3.0 stx.distcloud
Revision history for this message
Peng Peng (ppeng) wrote :
Andy (andy.wrs)
Changed in starlingx:
status: Triaged → In Progress
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/686256

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/686256
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=145544343bfa493e0eb0c6e3d3dcee0b4c8eeb43
Submitter: Zuul
Branch: master

commit 145544343bfa493e0eb0c6e3d3dcee0b4c8eeb43
Author: Andy Ning <email address hidden>
Date: Wed Oct 2 15:15:18 2019 -0400

    Add timeout to dcdbsync REST API calls

    This update added timeout to dcdbsync REST API calls in dcdbsync
    client. If no timeout is specified explicitly, the requests do not
    timeout. This will cause dcorch audit (which makes the REST calls)
    to hang forever and stop auditing when the REST requests failed
    for whatever reasons.

    Change-Id: I2d471365565df6cd3b0ae720cd81bc17610a0272
    Closes-Bug: 1845701
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Issue fixed on "2019-10-06_20-00-00"

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.