Comment 4 for bug 1891770

Revision history for this message
Bart Wensley (bartwensley) wrote :

Taking a closer look at the code, we do trigger an audit for all resources each time a subcloud transitions from unmanaged to managed. In this case, it looks like the problem was that the patch/load audits were run before the initial sync for the subcloud was done and the initial sync (which updates keystone data in the subcloud) caused the token for the subcloud to be invalidated.

Here is where the subcloud was managed:
2020-08-15T18:54:39.000 controller-0 -sh: info HISTORY: PID=3334493 UID=42425 dcmanager --os-username 'admin' --os-password 'Li69nux*' --os-tenant-name admin --os-auth-url http://[fd01:11::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL subcloud manage subcloud1

The initial sync was done here:
2020-08-15 18:54:50.185 204600 INFO dcorch.engine.initial_sync_manager [-] Initial sync for subcloud subcloud1
2020-08-15 18:54:54.750 204600 INFO dcorch.engine.generic_sync_manager [-] updating state for subcloud subcloud1 - management_state: None availability_status: None initial_sync_state: completed

The patch audit started while the initial sync was being done and failed due to a 401 Unauthorized response from keystone in the subcloud:
2020-08-15 18:54:52.533 204590 INFO dcmanager.audit.patch_audit [-] Triggered patch audit for subcloud: subcloud1.
2020-08-15 18:54:52.653 204590 ERROR dccommon.drivers.openstack.patching_v1 [-] query failed with RC: 401
2020-08-15 18:54:52.653 204590 WARNING dcmanager.audit.patch_audit [-] Cannot retrieve patches for subcloud: subcloud1, skip patch audit: Exception: query failed with RC: 401

Looking at subcloud2, which was managed later, the same problem doesn't happen.

Here is where the subcloud was managed:
2020-08-15T19:34:24.000 controller-0 -sh: info HISTORY: PID=3818089 UID=42425 dcmanager --os-username 'admin' --os-password 'Li69nux*' --os-tenant-name admin --os-auth-url http://[fd01:11::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL subcloud manage subcloud2

The initial sync was done here:
2020-08-15 19:34:26.165 204600 INFO dcorch.engine.initial_sync_manager [-] Initial sync for subcloud subcloud2
2020-08-15 19:34:30.652 204600 INFO dcorch.engine.generic_sync_manager [-] updating state for subcloud subcloud2 - management_state: None availability_status: None initial_sync_state: completed

The patch audit was done after the initial sync completed - the audit was successful:
2020-08-15 19:34:30.677 204590 INFO dcmanager.audit.patch_audit [-] Triggered patch audit for subcloud: subcloud2.

The solution here may be to delay the patch/load/firmware audits that are triggered after the manage, to give the initial sync time to complete.