Comment 0 for bug 1839378

Revision history for this message
Ming Lei (mlei) wrote : nova and neutron service didn't recover after force unlocking the compute host

Brief Description
-----------------
After force rebooting a host, the neuron and nova services keep in Init status and did not recover.

Severity
--------
Provide the severity of the defect.
Critical

Steps to Reproduce
------------------
1. When the host is unlocked and available, use "sudo reboot -f" to reboot the host. eg. compute-0
2. Waiting for enough time and run "kubectl get pod" to check the pods status

Expected Behavior
------------------
All pods are running or completed

Actual Behavior
----------------
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none>

Reproducibility
---------------
100% Reproducible

System Configuration
--------------------
2 + 2 system or two node system

Branch/Pull Time/Commit
-----------------------
stx master as of: 20190720T013000Z

Last Pass
---------
20190720T013000Z

Timestamp/Logs
--------------
[2019-08-06 02:38:58,214] 165 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-0
[2019-08-06 02:38:58,214] 301 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
[2019-08-06 02:38:58,328] 423 DEBUG MainThread ssh.expect :: Output:
Password:
[2019-08-06 02:38:58,329] 301 DEBUG MainThread ssh.send :: Send 'Li69nux*'
[2019-08-06 02:39:08,488] 423 DEBUG MainThread ssh.expect :: Output:
Rebooting.
packet_write_wait: Connection to 192.168.204.174 port 22: Broken pipe
controller-1:~$
[2019-08-06 02:39:38,507] 3619 INFO MainThread system_helper.wait_for_hosts_states:: Waiting for ['compute-0'] to reach state(s): {'availability': ['offline', 'failed']}...
[2019-08-06 02:39:38,508] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-06 02:39:38,508] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-08-06 02:39:40,047] 423 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | disabled | offline |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

[2019-08-06 02:49:45,734] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide'
[2019-08-06 02:49:46,009] 423 DEBUG MainThread ssh.expect :: Output:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none>
Test Activity
-------------
MTC Regression Testing