Brief Description
-----------------
A previous swact that occurs while a runtime manifest is being applied can lead to rejection of subsequent swact request.
Severity
--------
Critical
Steps to Reproduce
------------------
Perform a series of host-swact
Expected Behavior
------------------
All swacts should succeed
Actual Behavior
----------------
Third swact was rejected with the following reason:
controller-1 target Config 2bbdc151-53b9-427f-92fc-8c4e74df24bc not yet applied. Apply target Config via Lock/Unlock prior to Swact
and yet there was no alarm indicating controller-1 config being out of date.
sysinv.log:
===========
At 02:02:24, the config (8c4e74df24bc) was sent to active controller (controller-1), but the applied id had not be changed to previous config id (dc754340488c). At 02:02:27, controller-1 cleared the system config alarm against the previous config id (dc754340488c) again.
2019-10-16 02:02:24.390 560330 INFO sysinv.conductor.manager [-] Setting config target of host 'controller-1' to '2bbdc151-53b9-427f-92fc-8c4e74df24bc'.
2019-10-16 02:02:24.401 560330 WARNING sysinv.conductor.manager [-] controller-1: iconfig out of date: target 2bbdc151-53b9-427f-92fc-8c4e74df24bc, applied 495555d0-f3b6-473b-adab-dc754340488c
2019-10-16 02:02:24.401 560330 WARNING sysinv.conductor.manager [-] SYS_I Raise system config alarm: host controller-1 config applied: 495555d0-f3b6-473b-adab-dc754340488c vs. target: 2bbdc151-53b9-427f-92fc-8c4e74df24bc.
2019-10-16 02:02:24.458 560330 INFO sysinv.conductor.manager [-] _config_update_hosts config_uuid=2bbdc151-53b9-427f-92fc-8c4e74df24bc
2019-10-16 02:02:24.458 560330 INFO sysinv.conductor.manager [-] applying runtime manifest config_uuid=2bbdc151-53b9-427f-92fc-8c4e74df24bc, classes: ['openstack::keystone::endpoint::runtime', 'platform::firewall::runtime', 'platform::sysinv::runtime']
2019-10-16 02:02:24.469 560330 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-1 with config_uuid: 2bbdc151-53b9-427f-92fc-8c4e74df24bc
2019-10-16 02:02:27.377 560330 INFO sysinv.helm.manifest_base [-] Delete manifest file /opt/platform/armada/19.10/platform-integ-apps/1.0-8/platform-integ-apps-manifest-del.yaml generated
2019-10-16 02:02:27.378 560330 INFO sysinv.conductor.manager [-] No override change after configuration action, skipping re-apply of platform-integ-apps
2019-10-16 02:02:27.380 560330 INFO sysinv.agent.rpcapi [-] config_apply_runtime_manifest: fanout_cast: sending config 2bbdc151-53b9-427f-92fc-8c4e74df24bc {'classes': ['openstack::keystone::endpoint::runtime', 'platform::firewall::runtime', 'platform::sysinv::runtime'], 'force': False, 'personalities': ['controller'], 'host_uuids': [u'b0b201ef-e92f-4b1d-bdcc-5cfbfe092f89']} to agent
2019-10-16 02:02:27.384 20451 INFO sysinv.agent.manager [req-2bf8b457-d051-46da-b6db-a38ce87bdf99 admin None] config_apply_runtime_manifest: 2bbdc151-53b9-427f-92fc-8c4e74df24bc {u'classes': [u'openstack::keystone::endpoint::runtime', u'platform::firewall::runtime', u'platform::sysinv::runtime'], u'force': False, u'personalities': [u'controller'], u'host_uuids': [u'b0b201ef-e92f-4b1d-bdcc-5cfbfe092f89']} controller
2019-10-16 02:02:27.384 20451 INFO sysinv.agent.manager [req-2bf8b457-d051-46da-b6db-a38ce87bdf99 admin None] controller-active
2019-10-16 02:02:27.384 20451 INFO sysinv.agent.manager [req-2bf8b457-d051-46da-b6db-a38ce87bdf99 admin None] _apply_runtime_manifest with hieradata_path = '/opt/platform/puppet/19.10/hieradata'
2019-10-16 02:02:27.514 560330 INFO sysinv.conductor.manager [-] drbd-overview: pgsql-40.0, platform-9.8, extension-0.96875, patch-vault-0, etcd-4.8, dockerdistribution-16.0
2019-10-16 02:02:27.515 560330 INFO sysinv.conductor.manager [-] lvdisplay: pgsql-40.0, platform-10.0, extension-1.0, patch-vault-0, etcd-5.0, dockerdistribution-16.0
2019-10-16 02:02:27.526 560330 INFO sysinv.conductor.manager [-] SYS_I Clear system config alarm: controller-1 target config 495555d0-f3b6-473b-adab-dc754340488c
2019-10-16 02:02:27.585 560330 INFO sysinv.conductor.manager [-] SYS_I Clear system config alarm: controller-0 target config 2ff321c2-de17-4d97-97a8-b7bca29a868d
The keystone::endpoint, platform::firewall and platform::sysinv runtime manifests were applied around 02:02:28, and the manifests did not complete as a swact occured around 2019-10-16T02:02:51 (from controller-1 to controller-0 ) that interfered with the ongoing manifest apply.
puppet.log
==========
2019-10-16T02:02:28.416 Debug: 2019-10-16 02:02:28 +0000 Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8
....
....
....
2019-10-16T02:03:15.187 Unable to establish connection to http://[fd00:204::2]:5000/v3/auth/tokens: HTTPConnectionPool(host='fd00:204::2', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff01626d6d0>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Retrying for 169 more seconds
2019-10-16T02:03:18.958 Unable to establish connection to http://[fd00:204::2]:5000/v3/auth/tokens: HTTPConnectionPool(host='fd00:204::2', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5fccf206d0>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Retrying for 166 more seconds
2019-10-16T02:03:22.730 Unable to establish connection to http://[fd00:204::2]:5000/v3/auth/tokens: HTTPConnectionPool(host='fd00:204::2', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fadbf16b6d0>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Retrying for 162 more seconds
2019-10-16T02:03:26.464 Unable to establish connection to http://[fd00:204::2]:5000/v3/auth/tokens: HTTPConnectionPool(host='fd00:204::2', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f83da8fb6d0>: Failed to establish a new connection: [Errno 113] No route to host',))'. Retrying for 158 more seconds
2019-10-16T02:03:30.219 Unable to establish connection to http://[fd00:204::2]:5000/v3/auth/tokens: HTTPConnectionPool(host='fd00:204::2', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f702628c6d0>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Retrying for 154 more seconds
2019-10-16T02:03:33.979 Unable to establish connection to http://[fd00:204::2]:5000/v3/auth/tokens: HTTPConnectionPool(host='fd00:204::2', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6fa6e3e6d0>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Retrying for 151 more seconds
2019-10-16T02:04:02.555 Error: 2019-10-16 02:04:02 +0000 Cannot create /opt/platform/sysinv/19.10; parent directory /opt/platform/sysinv does not exist
2019-10-16T02:04:02.669 Error: 2019-10-16 02:04:02 +0000 /Stage[main]/Platform::Sysinv::Runtime/File[/opt/platform/sysinv/19.10]/ensure: change from absent to directory failed: Cannot create /opt/platform/sysinv/19.10; parent directory /opt/platform/sysinv does not exist
Reproducibility
---------------
Seen once.
System Configuration
--------------------
IPv6 standard system
Branch/Pull Time/Commit
-----------------------
Oct. 9th, 2019 build
Last Pass
---------
Swact normally works
Timestamp/Logs
--------------
Test Activity
-------------
System Test
Looks like the runtime manifest was start at 02:02:28 and done at 02:04:28, and the manifest was not successful. sysinv dir not available.
Note that there is a swact occurred from controller-1 -> controller-0 before the manifest completed, at 02:03:09, the swact failed the manifest. For this instance, there were 2 potential breaking points, keystone not being available for a short period of time (from disabled on c-1 to enabled on c-2), and as well the /opt/platform/
A potential solution is to block swact until runtime manifest is completed.