Comment 0 for bug 1855474

Revision history for this message
Yosief Gebremariam (ygebrema) wrote :

Brief Description
-----------------
Many OpenStack pods in fail to recover or were slow to recover after force rebooting the active controller

Severity
--------
Major

Steps to Reproduce
------------------
- Install and configure system, apply stx-openstack application
- 'sudo reboot -f' from active controller

Expected Behavior
------------------
- system swacts to the standby controller and all OpenStack pods recover to Running or Completed states.

Actual Behavior
----------------
- After force rebooting the controller, a number of OpenStack pods stuck in Init state. The keystone API and cinder-volume pods crushed.

controller-0:~$ kubectl get pods --all-namespaces | grep -v -e Completed -e Running
NAMESPACE NAME READY STATUS RESTARTS AGE
openstack cinder-api-59fd9c7c6f-86h2d 0/1 Init:0/2 0 3h
openstack cinder-volume-654bcb6569-lsjxt 0/1 Init:CrashLoopBackOff 22 3h
openstack fm-rest-api-78f97cc864-fqkhj 0/1 Init:0/1 0 3h
openstack glance-api-54777c6d45-gxrdc 0/1 Init:0/3 0 3h
openstack heat-api-69b8487b88-g4tc2 0/1 Init:0/1 0 3h
openstack heat-cfn-6b4b6b74f8-w7f78 0/1 Init:0/1 0 3h
openstack heat-engine-8458cf778f-xbbd4 0/1 Init:0/1 0 3h
openstack heat-engine-cleaner-1575645900-pd697 0/1 Init:0/1 0 178m
openstack horizon-5545469f58-j4bf6 0/1 Init:0/1 0 175m
openstack keystone-api-6c45dc9dbb-2v8h5 0/1 CrashLoopBackOff 43 3h39m
openstack keystone-api-6c45dc9dbb-pch72 0/1 Init:0/1 0 3h
openstack neutron-server-79c6fdf585-lwpb7 0/1 Init:0/1 0 3h
openstack nova-api-metadata-855ccf8fc4-mk446 0/1 Init:0/2 0 3h
openstack nova-api-osapi-58b7ffbf-zjv8l 0/1 Init:0/1 0 3h
openstack nova-conductor-6bbf89bf4c-7bhvg 0/1 Init:0/1 0 3h
openstack nova-novncproxy-58779744bd-szx4m 0/1 Init:0/3 0 3h
openstack nova-scheduler-67c986b5c8-rgt8x 0/1 Init:0/1 0 3h
openstack nova-service-cleaner-1575648000-kdln5 0/1 Init:0/1 0 143m

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Multi-node system

Branch/Pull Time/Commit
-----------------------
r/stx.3.0 as of 2019-12-05 02:30:00

Timestamp/Logs
--------------
2019-12-06 15:21:50,338] 181 INFO MainThread host_helper.reboot_hosts:: Rebooting active controller: controller-0
[2019-12-06 15:21:50,338] 311 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'