During rehome, the new destination system controller swact'd and system commands failed for a time

Bug #1947014 reported by Yuxing
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Juanita-Balaraj

Bug Description

Brief Description
-----------------
Unexpected SystemController swact during rehome migrate of a AIO-SX subcloud.

Severity
--------
Major

Steps to Reproduce
------------------
1 Step up two central clouds
2 Bring up a subcloud under central cloud A, update the registry credentials with the sysinv credentials in the subcloud
3 Migreate the subcloud to Central cloud B
4 Delete the subcloud in central cloud A
4 Unlock the subcloud after the migration

Expected Behavior
------------------
The central cloud B with the subcloud can be managed w/o any issues

Actual Behavior
----------------
The sysinv user in the central cloud B is locked, which causes the system controllers in central cloud B swact'd back n forth

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Distributed cloud

Branch/Pull Time/Commit
-----------------------
21.05

Last Pass
---------
NA

Timestamp/Logs
--------------
**controller-0 Time of migrate command
2021-10-01T00:41:04.000 controller-0 -sh: info HISTORY: PID=73553 UID=42425 dcmanager subcloud add --migrate --bootstrap-address 2607:f160:10:922b:ce:406:0:2000 --bootstrap-values dran0306-bootstrap-values.yaml --install-values dran0306-install-values.yaml

**controller-0 sysinv.log showing first keystonemiddleware warning.
sysinv 2021-10-01 00:50:24.962 591764 WARNING keystonemiddleware.auth_token [-] Identity response: {"error":{"code":401,"message":"The account is locked for user: 893669cdd4254bab92263b85c9f61860.","title":"Unauthorized"}}
: Unauthorized: The account is locked for user: 893669cdd4254bab92263b85c9f61860. (HTTP 401) (Request-ID: req-f67c25bb-c5b7-4b81-9c6b-b10c50d6c16f)
sysinv 2021-10-01 00:50:24.979 591764 WARNING keystonemiddleware.auth_token [-] Identity response: {"error":{"code":401,"message":"The account is locked for user: 893669cdd4254bab92263b85c9f61860.","title":"Unauthorized"}}
: Unauthorized: The account is locked for user: 893669cdd4254bab92263b85c9f61860. (HTTP 401) (Request-ID: req-d1d7f952-be45-4c9f-9844-60715b8d0260)
sysinv 2021-10-01 00:50:24.979 591764 CRITICAL keystonemiddleware.auth_token [-] Unable to validate token: Identity server rejected authorization necessary to fetch token data: ServiceError: Identity server rejected authorization necessary to fetch token data

sysinv 2021-10-01 00:52:19.787 590850 INFO sysinv.conductor.manager [-] Auto-apply failed prerequisites for platform-integ-apps: Crush map not applied

**controller-0 SIGTERM caught
sysinv 2021-10-01 00:53:26.011 591769 WARNING keystonemiddleware.auth_token [-] Identity response: {"error":{"code":401,"message":"The account is locked for user: 893669cdd4254bab92263b85c9f61860.","title":"Unauthorized"}}
: Unauthorized: The account is locked for user: 893669cdd4254bab92263b85c9f61860. (HTTP 401) (Request-ID: req-d88ab345-0e4a-4ae3-b6db-550d4099220f)
sysinv 2021-10-01 00:53:26.011 591769 CRITICAL keystonemiddleware.auth_token [-] Unable to validate token: Identity server rejected authorization necessary to fetch token data: ServiceError: Identity server rejected authorization necessary to fetch token data
sysinv 2021-10-01 00:53:26.092 590907 INFO oslo_service.service [-] Caught SIGTERM, stopping children
sysinv 2021-10-01 00:53:26.093 590907 INFO oslo.service.wsgi [-] Stopping WSGI server.
Second SIGTERM
sysinv 2021-10-01 00:53:57.762 398585 INFO oslo_service.service [-] Caught SIGTERM, stopping children
sysinv 2021-10-01 00:53:57.763 398585 INFO oslo.service.wsgi [-] Stopping WSGI server.
sysinv 2021-10-01 00:53:57.763 398585 INFO oslo_service.service [-] Waiting on 8 children to exit
sysinv 2021-10-01 00:53:57.763 398811 INFO oslo.service.wsgi [-] Stopping WSGI server.
sysinv 2021-10-01 00:53:57.763 398820 INFO oslo.service.wsgi [-] Stopping WSGI server.
sysinv 2021-10-01 00:54:03.218 590850 INFO sysinv.openstack.common.service [-] Caught SIGTERM, exiting
sysinv 2021-10-01 00:54:10.275 7147 ERROR sysinv.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 104] Connection reset by peer: error: [Errno 104] Connection reset by peer
SIGTERMS continue from here.

**controller-0 dcmanager.log shows oslo_service stopping children
2021-10-01 00:53:26.672 592602 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2021-10-01 00:53:26.673 592602 INFO oslo.service.wsgi [-] Stopping WSGI server.
2021-10-01 00:53:26.673 592602 INFO oslo_service.service [-] Waiting on 2 children to exit
2021-10-01 00:53:26.674 593156 INFO oslo.service.wsgi [-] Stopping WSGI server.
2021-10-01 00:53:26.674 593153 INFO oslo.service.wsgi [-] Stopping WSGI server.

Test Activity
-------------
Evaluation

Workaround
----------
Migrate the registry credentials with the new sysinv credentials before rehoming

Yuxing (yuxing)
Changed in starlingx:
assignee: nobody → Yuxing (yuxing)
Ghada Khalil (gkhalil)
tags: added: stx.distcloud
tags: added: stx.6.0
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
Changed in starlingx:
status: Triaged → In Progress
tags: added: stx.docs
Changed in starlingx:
assignee: Yuxing (yuxing) → Juanita-Balaraj (balaraj)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/814645
Committed: https://opendev.org/starlingx/utilities/commit/5bc220bc2bc13d76e227c3d6382d86c5db39ea43
Submitter: "Zuul (22348)"
Branch: master

commit 5bc220bc2bc13d76e227c3d6382d86c5db39ea43
Author: Yuxing Jiang <email address hidden>
Date: Tue Oct 19 16:03:10 2021 -0400

    Add a script to update registry credentials

    We have a script in https://docs.starlingx.io/dist_cloud/kubernetes/\
    updating-docker-registry-credentials-on-a-subcloud.html
    to update the docker registry credentials on a subcloud. As this
    script is expected to use in multiple scenarios, this commit adds this
    script in the /usr/local/bin directory, so it can be called to update
    the registry credentials.

    Changes against the original script:
    1. Add ghcr-registry as it is newly introduced.
    2. Add "source /etc/platform/openrc", so the OpenStack environmental
    variables can be included.
    3. Prompt for input username and password if not provided.

    Test:
    1. Create a patch with the "platform-util-controller" and apply it on
    an AIOSX controller.
    2. Call the script with the sysinv username and password, check the
    OpenStack secrets payload that the username and password are updated.
    And the secrets' UUIDs are updated to service parameters.
    3. Call the script without username and password, prompt for username
    and password.
    4. Call the script with 3 arguments, exit with an error message and
    the usage.

    Partial-Bug: 1947014
    Signed-off-by: Yuxing Jiang <email address hidden>
    Change-Id: I4d930b06992a22addb15f4d4edcfac31af5d440b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/813903
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/da276b2c7b320232f2a76aeee5e149583f748bdb
Submitter: "Zuul (22348)"
Branch: master

commit da276b2c7b320232f2a76aeee5e149583f748bdb
Author: Yuxing Jiang <email address hidden>
Date: Wed Oct 13 18:07:14 2021 -0400

    Update registry credentials during rehoming

    As we are switching to use 'sysinv' user instead of 'admin' user to
    access the registries, this commit adds a task in the rehoming
    playbook to update the registry credentials with the sysinv
    credentials from the new system controllers which a subcloud is
    migrating to.

    Test steps:
    1. Deploy a AIOSX subcloud in central cloud A, update the subcloud's
    registries with its sysinv credentials.
    2. Update the admin credentials from central cloud B in the subcloud.
    3. Migrate the subcloud to central cloud B.
    4. Lock/unlock the subcloud after its deploy status turns to
    "complete" state.

    Test result:
    The subcloud turns online after unlocking and turns to "in-sync" after
    being managed by central cloud B. The registries auth-secrets are all
    updated to sysinv credentials from central cloud B. The central
    registry can be accessed from the subcloud with the sysinv user and
    its password.

    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/814645
    Closes-Bug: 1947014
    Signed-off-by: Yuxing Jiang <email address hidden>
    Change-Id: I384930d3842f8a4da03648af7153dea430c49baa

Changed in starlingx:
status: In Progress → Fix Released
Yuxing (yuxing)
Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The software fix is merged, so marking this as Fix Released. The doc update can still be linked to the same LP if that's what's planned.

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to docs (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/docs/+/819490

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to docs (master)

Reviewed: https://review.opendev.org/c/starlingx/docs/+/819490
Committed: https://opendev.org/starlingx/docs/commit/c992d7d7d4da4e0547d466a500f5d5eff87d143b
Submitter: "Zuul (22348)"
Branch: master

commit c992d7d7d4da4e0547d466a500f5d5eff87d143b
Author: Juanita-Balaraj <email address hidden>
Date: Fri Nov 26 13:45:52 2021 -0500

    Modified the topic, "Update Docker Registry Credentials on a Subcloud" to remove "./ to invoke the script.

    Closes Bug: 1947014
    Signed-off-by: Juanita-Balaraj <email address hidden>
    Change-Id: I4195e191fa99ea37aeff22273883e9862a5da6ef

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.