Brief Description
-----------------
After a reboot or lock/unlock of an AIO-SX, some stx-openstack pods remain in an unknown or init state and do not recover.
Severity
--------
<Major: System/Feature is usable but degraded>
Steps to Reproduce
------------------
Apply stx-openstack application to an AIO-SX
system host-lock controller-0
system host-unlock controller-0
Expected Behavior
------------------
All pods should recover and be in a ready/running state shortly after the controller recovers.
Actual Behavior
----------------
One or multiple stx-openstack pods remain in unknown/init state.
Reproducibility
---------------
Intermittent - seen rarely on some labs and 25-50% of the time on other labs.
System Configuration
--------------------
<One node system, Two node system, Multi-node system, Dedicated storage, https, IPv4, IPv6 etc.>
Branch/Pull Time/Commit
-----------------------
Any STX master branch load from an August build
Last Pass
---------
unknown
Timestamp/Logs
--------------
From a fairly recent test, here is an example of the pod states after an AIO-SX lock/unlock:
controller-0:~$ kubectl get pods --all-namespaces -o wide | grep -v -e Running -e Completed
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monitor mon-elastic-services-8684f65895-nb8mg 0/1 Unknown 0 4d22h <none> controller-0 <none> <none>
monitor mon-elasticsearch-client-0 0/1 Unknown 0 4d22h <none> controller-0 <none> <none>
monitor mon-elasticsearch-data-0 0/1 Unknown 0 4d22h <none> controller-0 <none> <none>
monitor mon-elasticsearch-master-0 0/1 Unknown 0 4d22h <none> controller-0 <none> <none>
monitor mon-filebeat-42vd2 0/1 Init:CrashLoopBackOff 20 4d21h 172.16.192.67 controller-0 <none> <none>
monitor mon-kibana-7f6cfc6bb7-9lgkf 0/1 Unknown 0 4d22h <none> controller-0 <none> <none>
monitor mon-logstash-0 0/1 Unknown 0 4d1h <none> controller-0 <none> <none>
monitor mon-metricbeat-metrics-77fbfc68d6-756rd 0/1 Unknown 0 4d21h <none> controller-0 <none> <none>
monitor mon-metricbeat-vmvvh 0/1 Init:CrashLoopBackOff 25 4d21h 192.168.204.2 controller-0 <none> <none>
openstack cinder-api-5fdf48bf5d-h8rjv 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack cinder-backup-6548b6767-pfwjz 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack cinder-scheduler-65f4b69f66-7s8wx 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack cinder-volume-5d98966645-5d977 0/1 Init:0/4 0 122m 172.16.192.66 controller-0 <none> <none>
openstack cinder-volume-usage-audit-1595861100-k9rn5 0/1 Init:0/1 0 122m 172.16.192.112 controller-0 <none> <none>
openstack glance-api-5bfd4f599c-4s274 0/1 Init:0/3 0 122m 172.16.192.88 controller-0 <none> <none>
openstack heat-api-5b9598987f-8n9fb 0/1 Init:0/1 0 122m 172.16.192.78 controller-0 <none> <none>
openstack heat-cfn-679bc9cbfc-v8mf2 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack heat-engine-78fb44c4c6-gvvqv 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack heat-engine-cleaner-1595861100-kwhhx 0/1 Init:0/1 0 122m 172.16.192.81 controller-0 <none> <none>
openstack horizon-6d6dbcd779-vbrzp 0/1 Init:0/1 0 122m 172.16.192.119 controller-0 <none> <none>
openstack ingress-79d7f888cd-8hl67 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack ingress-error-pages-6554f75d57-ndjqd 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack keystone-api-cc7995bbf-rtwfk 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack libvirt-libvirt-controller-0-937646f6-6g5kg 0/1 StartError 1 4d23h 192.168.204.2 controller-0 <none> <none>
openstack mariadb-ingress-5d6c5b7944-gxzz7 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack mariadb-ingress-error-pages-598984c99f-cxz2n 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack mariadb-server-0 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack networking-avs-avr-agent-controller-0-937646f6-l297w 0/1 Unknown 0 4d23h 192.168.204.2 controller-0 <none> <none>
openstack networking-avs-avs-agent-controller-0-937646f6-7mkqd 0/1 Unknown 0 4d23h 192.168.204.2 controller-0 <none> <none>
openstack neutron-dhcp-agent-controller-0-937646f6-985kq 0/1 Unknown 0 4d23h 192.168.204.2 controller-0 <none> <none>
openstack neutron-metadata-agent-controller-0-937646f6-7q9pp 0/1 Init:0/2 0 122m 192.168.204.2 controller-0 <none> <none>
openstack neutron-server-58cb698cf-56j95 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack neutron-sriov-agent-controller-0-937646f6-54cr2 0/1 Unknown 0 4d23h 192.168.204.2 controller-0 <none> <none>
openstack nova-api-metadata-6545d5dddc-lcsqx 0/1 Unknown 1 4d23h <none> controller-0 <none> <none>
openstack nova-api-osapi-555c5474cd-bwcr4 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack nova-api-proxy-98497fbdf-dfdf5 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack nova-compute-controller-0-937646f6-w8rzf 0/2 Unknown 0 4d23h 192.168.204.2 controller-0 <none> <none>
openstack nova-conductor-6f6d9df696-sfqzx 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack nova-novncproxy-58fd88b78f-szpsq 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack nova-scheduler-69bf6574f7-92sfg 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack nova-service-cleaner-1595862000-8pbdw 0/1 Init:0/1 0 107m 172.16.192.74 controller-0 <none> <none>
openstack osh-openstack-memcached-memcached-85f5694d98-4jph9 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack osh-openstack-rabbitmq-rabbitmq-0 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
openstack placement-api-575f9f9f8c-5sndm 0/1 Unknown 0 4d23h <none> controller-0 <none> <none>
Test Activity
-------------
System Testing
Workaround
----------
delete the pods in unknown state which causes them to start back up.
Chris Friesen took a look and believes this is similar to https:/ /bugs.launchpad .net/starlingx/ +bug/1874858 - however I opened a new LP as this is now impacting stx-openstack pods.