2019-08-08 13:55:46 |
Ming Lei |
description |
Brief Description
-----------------
After force rebooting a host, the neuron and nova services keep in Init status and did not recover.
Severity
--------
Provide the severity of the defect.
Critical
Steps to Reproduce
------------------
1. When the host is unlocked and available, use "sudo reboot -f" to reboot the host. eg. compute-0
2. Waiting for enough time and run "kubectl get pod" to check the pods status
Expected Behavior
------------------
All pods are running or completed
Actual Behavior
----------------
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none>
Reproducibility
---------------
100% Reproducible
System Configuration
--------------------
2 + 2 system or two node system
Branch/Pull Time/Commit
-----------------------
stx master as of: 20190720T013000Z
Last Pass
---------
20190720T013000Z
Timestamp/Logs
--------------
[2019-08-06 02:38:58,214] 165 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-0
[2019-08-06 02:38:58,214] 301 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
[2019-08-06 02:38:58,328] 423 DEBUG MainThread ssh.expect :: Output:
Password:
[2019-08-06 02:38:58,329] 301 DEBUG MainThread ssh.send :: Send 'Li69nux*'
[2019-08-06 02:39:08,488] 423 DEBUG MainThread ssh.expect :: Output:
Rebooting.
packet_write_wait: Connection to 192.168.204.174 port 22: Broken pipe
controller-1:~$
[2019-08-06 02:39:38,507] 3619 INFO MainThread system_helper.wait_for_hosts_states:: Waiting for ['compute-0'] to reach state(s): {'availability': ['offline', 'failed']}...
[2019-08-06 02:39:38,508] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-06 02:39:38,508] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-08-06 02:39:40,047] 423 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | disabled | offline |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[2019-08-06 02:49:45,734] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide'
[2019-08-06 02:49:46,009] 423 DEBUG MainThread ssh.expect :: Output:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none>
Test Activity
-------------
MTC Regression Testing |
Brief Description
-----------------
After force rebooting a host, the neuron and nova services keep in Init status and did not recover.
Severity
--------
Provide the severity of the defect.
Critical
Steps to Reproduce
------------------
1. When the host is unlocked and available, use "sudo reboot -f" to reboot the host. eg. compute-0
2. Waiting for enough time and run "kubectl get pod" to check the pods status
Expected Behavior
------------------
All pods are running or completed
Actual Behavior
----------------
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none>
Reproducibility
---------------
100% Reproducible
System Configuration
--------------------
2 + 2 system or two node system
Branch/Pull Time/Commit
-----------------------
stx master as of: 20190720T013000Z
Last Pass
---------
20190720T013000Z
Timestamp/Logs
--------------
[2019-08-06 02:38:58,214] 165 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-0
[2019-08-06 02:38:58,214] 301 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
[2019-08-06 02:38:58,328] 423 DEBUG MainThread ssh.expect :: Output:
Password:
[2019-08-06 02:38:58,329] 301 DEBUG MainThread ssh.send :: Send 'Li69nux*'
[2019-08-06 02:39:08,488] 423 DEBUG MainThread ssh.expect :: Output:
Rebooting.
packet_write_wait: Connection to 192.168.204.174 port 22: Broken pipe
controller-1:~$
[2019-08-06 02:39:38,507] 3619 INFO MainThread system_helper.wait_for_hosts_states:: Waiting for ['compute-0'] to reach state(s): {'availability': ['offline', 'failed']}...
[2019-08-06 02:39:38,508] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-06 02:39:38,508] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-08-06 02:39:40,047] 423 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | disabled | offline |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[2019-08-06 02:49:45,734] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide'
[2019-08-06 02:49:46,009] 423 DEBUG MainThread ssh.expect :: Output:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none>
[2019-08-06 03:02:08,744] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-08-06 03:02:10,193] 423 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+---------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+---------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+----------+----------------------------+
| 0192e25d-def0-4134-ad62-a64aaf495695 | 200.006 | compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress. | host=compute-0.process=pci-irq-affinity-agent | major | 2019-08-06T02:43:27.705369 |
| 4cb4a0ee-f493-420b-a218-20759a112258 | 250.001 | compute-0 Configuration is out-of-date. | host=compute-0 | major | 2019-08-06T02:41:40.375879 |
| 9ac05c3b-a79e-4544-877f-720c8056ef5f | 270.001 | Host compute-1 compute services failure, failed to disable nova services | host=compute-1.services=compute | critical | 2019-08-06T02:39:52.177900 |
| a2e2ec3c-9490-42fc-9099-bd4427daf5af | 270.001 | Host compute-0 compute services failure, failed to disable nova services | host=compute-0.services=compute | critical | 2019-08-06T02:39:04.766953 |
| 2409cab2-28e3-45ca-b0fe-0712c3134366 | 750.002 | Application Apply Failure | k8s_application=stx-openstack | major | 2019-08-03T17:28:23.838877 |
+--------------------------------------+----------+---------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+----------+----------------------------+
controller-1:~$
[2019-08-06 03:02:10,194] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-08-06 03:02:10,297] 423 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2019-08-06 03:02:10,297] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_63_66
[2019-08-06 03:02:10,297] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-06 03:02:10,297] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide'
[2019-08-06 03:02:10,528] 423 DEBUG MainThread ssh.expect :: Output:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 102m 192.168.204.174 compute-0 <none> <none>
openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 102m 192.168.204.174 compute-0 <none> <none>
openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 102m 192.168.204.174 compute-0 <none> <none>
openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 102m 192.168.204.174 compute-0 <none> <none>
openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 102m 192.168.204.174 compute-0 <none> <none>
openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 102m 192.168.204.174 compute-0 <none> <none>
openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 102m 192.168.204.174 compute-0 <none> <none>
openstack nova-service-cleaner-1565060400-kkg26 0/1 Init:0/1 0 2m4s 172.16.166.255 controller-1 <none> <none>
controller-1:~$
[2019-08-06 03:02:10,528] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-08-06 03:02:10,631] 423 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2019-08-06 03:02:10,632] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_63_66
[2019-08-06 03:02:10,632] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-06 03:02:10,632] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-08-06 03:02:12,153] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | apply-failed | operation aborted, check logs for detail |
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
controller-1:~$
[2019-08-06 03:02:12,153] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-08-06 03:02:12,256] 423 DEBUG MainThread ssh.expect :: Output:
0
controller-1:~$
[2019-08-06 03:02:12,258] 266 DEBUG MainThread conftest.testcase_log::
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Test steps started for: testcases/functional/mtc/test_multi_node_failure_avoidance.py::test_multi_node_failure_avoidance[300-5]
[2019-08-06 03:02:12,258] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_63_66
[2019-08-06 03:02:12,259] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-06 03:02:12,259] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-08-06 03:02:13,793] 423 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | degraded |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
Test Activity
-------------
MTC Regression Testing |
|