platform-integ-apps apply-failed after lock/unlock controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Bob Church |
Bug Description
Brief Description
-----------------
platform-integ-apps in apply-failed state after lock unlock controller-0 on a simplex system.
tiller pod stuck at MatchNodeSelector.
Severity
--------
Major
Steps to Reproduce
------------------
1. Install and configure a simplex system --> Initial apply was successful
2. lock/unlock controller
TC-name: test_lock_
Expected Behavior
------------------
2. lock/unlock succeeded, system is still healthy after that
Actual Behavior
----------------
2. lock/unlock succeeded, but platform-integ-apps failed, tiller pod stuck at MatchNodeSelector.
Reproducibility
---------------
Happened 2/3 times on simplex systems.
System Configuration
-------
One node system
Lab-name: wcp122
Branch/Pull Time/Commit
-------
2019-10-23_20-00-00
Last Pass
---------
2019-10-21_20-00-00 on same system
Timestamp/Logs
--------------
[2019-10-24 07:04:45,331] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-10-24 07:05:24,844] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-10-24 07:14:26,862] 433 DEBUG MainThread ssh.expect :: Output:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system tiller-
[2019-10-24 07:14:48,208] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
+------
| application | version | manifest name | manifest file | status | progress |
+------
| platform-integ-apps | 1.0-8 | platform-
+------
Test Activity
-------------
Sanity
Potentially looks like an endpoint/firewall update may be impacting an application apply that is in progress.
# Initial apply works during provisioning conductor. manager [-] Platform managed application platform- integ-apps: Applying... conductor. kube_app [-] Register the initial abort status of app platform-integ-apps conductor. kube_app [-] Application platform-integ-apps (1.0-8) apply started. conductor. kube_app [-] All docker images for application platform-integ-apps were successfully downloaded in 70 seconds conductor. kube_app [-] Application manifest /manifests/ platform- integ-apps/ 1.0-8/platform- integ-apps- manifest. yaml was successfully applied/re-applied. conductor. kube_app [-] Exiting progress monitoring thread for app platform-integ-apps conductor. kube_app [-] Application platform-integ-apps (1.0-8) apply completed.
2019-10-24 06:56:58.365 110188 INFO sysinv.
2019-10-24 06:56:58.640 110188 INFO sysinv.
2019-10-24 06:56:58.940 110188 INFO sysinv.
2019-10-24 06:58:09.356 110188 INFO sysinv.
2019-10-24 06:58:33.243 110188 INFO sysinv.
2019-10-24 06:58:33.244 110188 INFO sysinv.
2019-10-24 06:58:33.549 110188 INFO sysinv.
# An override change has been detected. Not sure why this is the case. Needs investigation… conductor. manager [-] There has been an overrides change, setting up reapply of platform-integ-apps
2019-10-24 07:13:16.722 102497 INFO sysinv.
# Firewall update is triggered apply_runtime_ manifest: fanout_cast: sending config 66c95e55- 43a1-4b79- 847d-43e6960123 d2 {'classes': ['openstack: :keystone: :endpoint: :runtime' , 'platform: :firewall: :runtime' , 'platform: :sysinv: :runtime' ], 'force': False, 'personalities': ['controller'], 'host_uuids': [u'4624ddd2- 6b83-4e12- ada6-f6862e1205 09']} to agent agent.manager [req-337b5587- 475e-4645- 8aee-9b8013fcc6 69 admin None] config_ apply_runtime_ manifest: 66c95e55- 43a1-4b79- 847d-43e6960123 d2 {u'classes': [u'openstack: :keystone: :endpoint: :runtime' , u'platform: :firewall: :runtime' , u'platform: :sysinv: :runtime' ], u'force': False, u'personalities': [u'controller'], u'host_uuids': [u'4624ddd2- 6b83-4e12- ada6-f6862e1205 09']} controller
2019-10-24 07:13:16.726 102497 INFO sysinv.agent.rpcapi [-] config_
2019-10-24 07:13:16.728 22171 INFO sysinv.
2019-10-24 07:13:31.950 102497 INFO sysinv. conductor. manager [req-de8da658- fc3b-423c- b47e-eb2ff9cf93 42 admin admin] Updating platform data for host: 4624ddd2- 6b83-4e12- ada6-f6862e1205 09 with: {u'availability': u'services- enabled' } helm.manifest_ base [req-de8da658- fc3b-423c- b47e-eb2ff9cf93 42 admin admin] Delete manifest file /opt/platform/ armada/ 19.10/platform- integ-apps/ 1.0-8/platform- integ-apps- manifest- del.yaml generated conductor. manager [req-de8da658- fc3b-423c- b47e-eb2ff9cf93 42 admin admin] There has been an overrides change, setting up reapply of platform-integ-apps
2019-10-24 07:13:32.171 102497 INFO sysinv.
2019-10-24 07:13:32.172 102497 INFO sysinv.
# Re-apply occurs due to reapply flag being raised conductor. manager [-] Reapplying platform-integ-apps app conductor. kube_app [-] Register the initial abort status of app platform-integ-apps
2019-10-24 07:14:13.743 102497 INFO sysinv.
2019-10-24 07:14:13.747 102497 INFO sysinv.
2019-10-24 07:14:14.054 102497 INFO sy...