Comment 2 for bug 1860999

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/707258
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=0389c7fbb1630988acd385140c9fc16835aae090
Submitter: Zuul
Branch: master

commit 0389c7fbb1630988acd385140c9fc16835aae090
Author: Bart Wensley <email address hidden>
Date: Tue Feb 11 15:21:09 2020 -0600

    Fix subcloud manage/unmanage issues caused by identity sync

    Recently identity (keystone) sync functionality was added to the
    dcorch. This changed the behaviour of the update_subcloud_states
    RPC. The dcmanager expects this RPC to be handled quickly and
    a reply sent almost immediately (timeout is 60s). Instead, the
    dcorch is now performing an identity sync when handling this
    RPC, which involves sending multiple messages to a subcloud and
    waiting for replies. This causes the update_subcloud_states RPC
    to time out sometimes (especially if a subcloud is unreachable)
    and the dcmanager/dcorch states to get out of sync, with no
    recovery mechanism in place.

    To fix this, I have create a new initial sync manager in the
    dcorch. When the dcorch handles the update_subcloud_states RPC,
    it will now just update the subcloud to indicate that an initial
    sync is required and then reply to the RPC immediately. The
    initial sync manager will perform the initial sync in the
    background (separate greenthreads) and enable the subcloud when
    it has completed. I also enhanced the dcmanager subcloud audit
    to periodically send a state update for each subcloud to the
    dcorch, which will correct any state mismatches that might
    occur.

    Change-Id: I70b98d432c3ed56b9532117f69f02d4a0cff5742
    Closes-Bug: 1860999
    Closes-Bug: 1861157
    Signed-off-by: Bart Wensley <email address hidden>