IPv6 Distributed Cloud: subcloud identity_sync_status become out-of-sync periodically

Bug #1847661 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
DC subcloud is initially in sync status, but subcloud identity_sync_status become out-of-sync periodically without any system change.

Severity
--------
Major

Steps to Reproduce
------------------
check dcmanager subcloud list

TC-name: DC regression

Expected Behavior
------------------
Should be always in sync status if there is no change in system

Actual Behavior
----------------

Reproducibility
---------------
Reproducible

System Configuration
--------------------
DC system
IPv6

Lab-name: DC

Branch/Pull Time/Commit
-----------------------
master as of "2019-10-06_20-00-00"

Last Pass
---------

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud show 9
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 9 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | 19.10 |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:2::0/64 |
| management_start_ip | fd01:2::2 |
| management_end_ip | fd01:2::11 |
| management_gateway_ip | fd01:2::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| created_at | 2019-10-10 16:04:56.586969 |
| updated_at | 2019-10-10 17:18:54.018731 |
| identity_sync_status | out-of-sync |
| patching_sync_status | in-sync |
| platform_sync_status | in-sync |
+-----------------------------+----------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ date
Thu Oct 10 18:42:00 UTC 2019
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+---------+
| 8 | subcloud4 | managed | online | complete | in-sync |
| 9 | subcloud1 | managed | online | complete | in-sync |
| 10 | subcloud5 | unmanaged | online | complete | unknown |
+----+-----------+------------+--------------+---------------+---------+

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud show 9
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 9 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | 19.10 |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:2::0/64 |
| management_start_ip | fd01:2::2 |
| management_end_ip | fd01:2::11 |
| management_gateway_ip | fd01:2::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| created_at | 2019-10-10 16:04:56.586969 |
| updated_at | 2019-10-10 17:18:54.018731 |
| identity_sync_status | out-of-sync |
| patching_sync_status | in-sync |
| platform_sync_status | in-sync |
+-----------------------------+----------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud show 9
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 9 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | 19.10 |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:2::0/64 |
| management_start_ip | fd01:2::2 |
| management_end_ip | fd01:2::11 |
| management_gateway_ip | fd01:2::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| created_at | 2019-10-10 16:04:56.586969 |
| updated_at | 2019-10-10 17:18:54.018731 |
| identity_sync_status | in-sync |
| patching_sync_status | in-sync |
| platform_sync_status | in-sync |
+-----------------------------+----------------------------+

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Peng, How frequent do the subclouds get out of sync over a specific period of time? Is it a specific subcloud or all of them?

description: updated
description: updated
tags: added: stx.distcloud
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.3.0
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority - needs further investigation. system seems to recover, but need to understand why it gets into this state to start with

Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
Yang Liu (yliu12)
tags: added: stx.retestneeded
Andy (andy.wrs)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/690577

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/690577
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=eb572c47f808ce2dd241f7aede44c14a550e5c96
Submitter: Zuul
Branch: master

commit eb572c47f808ce2dd241f7aede44c14a550e5c96
Author: Andy Ning <email address hidden>
Date: Tue Oct 22 15:49:06 2019 -0400

    Check ids instead of names for DC assignment synchronization

    In distributed cloud, subcloud's user ids, project ids and role ids
    are synced with System Controller. But project role assignment
    functions still use names to check if master resources and subcloud
    resources has the same id, and if user, project and role exist before
    POST call to grant project role to user. This will cause an assignment
    PUT job created and identity sync status flip from "in-sync" to
    "out-of-sync" and back to "in-sync" again for every audit cycle.

    A more detailed explanation, at the very first audit, roles are queued
    for sync but the job doesn't run and their ids don't changed at the
    subcloud yet. At the same audit dcorch finds the project role assignment
    actually exist (since it check names in has_same_ids()), so it maps the
    the assginment of center cloud to the assignment of the subcloud with
    the current ids. Once the roles sync job queued get executed, roles ids
    are changed. At this point the assignment mappings becomes invalid. The
    next audit can no longer find the mapped assignment from subcloud so the
    logic falls into audit_discrepancy() where the has_same_ids() return
    TRUE again and a PUT job is queued for the assignment. The sync endpoint
    type becomes "out-of-sync" since there is a job for it. Once the PUT
    function return, its status returns to "in-sync" again.

    This change updated project role assignment functions to use ids
    instead of names.

    Change-Id: I024f2c2f97aaf9670d7b2c5c70a2dae7d6d08d38
    Closes-Bug: 1847661
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

verified on
[sysadmin@controller-0 ~(keystone_admin)]$ cat /etc/build.info
###
### Wind River Cloud Platform
### Release 19.10
###
### Wind River Systems, Inc.
###

SW_VERSION="19.10"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2019-11-02_08-39-54"
SRC_BUILD_ID="74"

JOB="TC_19.10_Build"
BUILD_BY="jenkins"
BUILD_NUMBER="74"
BUILD_HOST="yow-cgts4-lx.wrs.com"
BUILD_DATE="2019-11-02 08:41:48 -0400"

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.