stx-openstack in apply-failed after lock/unlock standby controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Daniel Badea |
Bug Description
Brief Description
-----------------
After lock/unlock controller-1, stx-openstack is in apply-failed status.
Severity
--------
Major
Steps to Reproduce
------------------
- stx-openstack is applied
- system host-lock controller-1
- system host-unlock controller-1
- wait for controller-1 to become enabled/available in system host-list
- watch for system application-list
Expected Behavior
------------------
- stx-openstack is reapplied after unlock, and it should reach applied status
Actual Behavior
----------------
- stx-openstack in apply-failed status after a few minutes
Reproducibility
---------------
Intermittent
System Configuration
-------
Dedicated storage
Lab-name: wcp113-121
Branch/Pull Time/Commit
-------
stx master as of 20190720T013000Z
Last Pass
---------
Same system same load. It seems to be an intermittent issue.
Timestamp/Logs
--------------
# Unlock requested
[2019-07-21 03:12:57,041] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
# controller-1 available and ready
[2019-07-21 03:19:01,711] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | compute-2 | worker | unlocked | enabled | available |
| 5 | compute-3 | worker | unlocked | enabled | available |
| 6 | compute-4 | worker | unlocked | enabled | available |
| 7 | controller-1 | controller | unlocked | enabled | available |
| 8 | storage-0 | storage | unlocked | enabled | available |
| 9 | storage-1 | storage | unlocked | enabled | available |
+----+-
# apply-failed
[2019-07-21 03:19:29,459] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
+------
| application | version | manifest name | manifest file | status | progress |
+------
| platform-integ-apps | 1.0-7 | platform-
| stx-openstack | 1.0-17-
+------
# mariadb pod is in crashloopbackoff for a few minutes, but not sure if this is related to the apply-failed at all.
[2019-07-21 03:21:34,134] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pods --all-namespaces -o wide | grep -v -e Running -e Completed'
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack mariadb-server-1 0/1 CrashLoopBackOff 2 3m29s 172.16.166.141 controller-1 <none> <none>
[2019-07-21 03:21:34,501] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pods --all-namespaces -o wide | grep -v -e Running -e Completed -e NAMESPACE | awk '{system("kubectl describe pods -n "$1" "$2)}''
....
Tolerations: node.kubernetes
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m25s (x4 over 3m28s) default-scheduler 0/7 nodes are available: 1 node(s) didn't match pod affinity/
Normal Scheduled 2m35s default-scheduler Successfully assigned openstack/
Normal SuccessfulAttac
Normal Pulled 86s kubelet, controller-1 Container image "registry.
Normal Created 86s kubelet, controller-1 Created container
Normal Started 85s kubelet, controller-1 Started container
Normal Pulled 79s kubelet, controller-1 Container image "registry.
Normal Created 79s kubelet, controller-1 Created container
Normal Started 79s kubelet, controller-1 Started container
Normal Pulled 24s (x3 over 70s) kubelet, controller-1 Container image "registry.
Normal Created 23s (x3 over 62s) kubelet, controller-1 Created container
Normal Started 22s (x3 over 62s) kubelet, controller-1 Started container
Warning BackOff 5s (x3 over 39s) kubelet, controller-1 Back-off restarting failed container
Test Activity
-------------
Regression Testing
tags: | added: stx.regression |
Changed in starlingx: | |
status: | Triaged → In Progress |
Logs are split into two parts. Use cat cmd to combine them.