Activity log for bug #1839378

Date Who What changed Old value New value Message
2019-08-07 19:45:14 Ming Lei bug added bug
2019-08-07 19:47:04 Ming Lei summary nova and neutron service didn't recover after force unlocking the compute host nova and neutron service didn't recover after force unlocking the host
2019-08-07 19:50:36 Ming Lei attachment added 2+2 node system https://bugs.launchpad.net/starlingx/+bug/1839378/+attachment/5281481/+files/ALL_NODES_20190806.030810.tar
2019-08-07 19:55:09 Ming Lei attachment added 2 node system logs (similar issue) https://bugs.launchpad.net/starlingx/+bug/1839378/+attachment/5281482/+files/ALL_NODES_20190802.225330.tar
2019-08-08 13:55:46 Ming Lei description Brief Description ----------------- After force rebooting a host, the neuron and nova services keep in Init status and did not recover. Severity -------- Provide the severity of the defect. Critical Steps to Reproduce ------------------ 1. When the host is unlocked and available, use "sudo reboot -f" to reboot the host. eg. compute-0 2. Waiting for enough time and run "kubectl get pod" to check the pods status Expected Behavior ------------------ All pods are running or completed Actual Behavior ---------------- NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none> Reproducibility --------------- 100% Reproducible System Configuration -------------------- 2 + 2 system or two node system Branch/Pull Time/Commit ----------------------- stx master as of: 20190720T013000Z Last Pass --------- 20190720T013000Z Timestamp/Logs -------------- [2019-08-06 02:38:58,214] 165 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-0 [2019-08-06 02:38:58,214] 301 DEBUG MainThread ssh.send :: Send 'sudo reboot -f' [2019-08-06 02:38:58,328] 423 DEBUG MainThread ssh.expect :: Output: Password: [2019-08-06 02:38:58,329] 301 DEBUG MainThread ssh.send :: Send 'Li69nux*' [2019-08-06 02:39:08,488] 423 DEBUG MainThread ssh.expect :: Output: Rebooting. packet_write_wait: Connection to 192.168.204.174 port 22: Broken pipe controller-1:~$ [2019-08-06 02:39:38,507] 3619 INFO MainThread system_helper.wait_for_hosts_states:: Waiting for ['compute-0'] to reach state(s): {'availability': ['offline', 'failed']}... [2019-08-06 02:39:38,508] 466 DEBUG MainThread ssh.exec_cmd:: Executing command... [2019-08-06 02:39:38,508] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list' [2019-08-06 02:39:40,047] 423 DEBUG MainThread ssh.expect :: Output: +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | unlocked | enabled | available | | 2 | compute-0 | worker | unlocked | disabled | offline | | 3 | compute-1 | worker | unlocked | enabled | available | | 4 | controller-1 | controller | unlocked | enabled | available | +----+--------------+-------------+----------------+-------------+--------------+ [2019-08-06 02:49:45,734] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide' [2019-08-06 02:49:46,009] 423 DEBUG MainThread ssh.expect :: Output: NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none> Test Activity ------------- MTC Regression Testing Brief Description ----------------- After force rebooting a host, the neuron and nova services keep in Init status and did not recover. Severity -------- Provide the severity of the defect. Critical Steps to Reproduce ------------------ 1. When the host is unlocked and available, use "sudo reboot -f" to reboot the host. eg. compute-0 2. Waiting for enough time and run "kubectl get pod" to check the pods status Expected Behavior ------------------ All pods are running or completed Actual Behavior ---------------- NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none> Reproducibility --------------- 100% Reproducible System Configuration -------------------- 2 + 2 system or two node system Branch/Pull Time/Commit ----------------------- stx master as of: 20190720T013000Z Last Pass --------- 20190720T013000Z Timestamp/Logs -------------- [2019-08-06 02:38:58,214] 165 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-0 [2019-08-06 02:38:58,214] 301 DEBUG MainThread ssh.send :: Send 'sudo reboot -f' [2019-08-06 02:38:58,328] 423 DEBUG MainThread ssh.expect :: Output: Password: [2019-08-06 02:38:58,329] 301 DEBUG MainThread ssh.send :: Send 'Li69nux*' [2019-08-06 02:39:08,488] 423 DEBUG MainThread ssh.expect :: Output: Rebooting. packet_write_wait: Connection to 192.168.204.174 port 22: Broken pipe controller-1:~$ [2019-08-06 02:39:38,507] 3619 INFO MainThread system_helper.wait_for_hosts_states:: Waiting for ['compute-0'] to reach state(s): {'availability': ['offline', 'failed']}... [2019-08-06 02:39:38,508] 466 DEBUG MainThread ssh.exec_cmd:: Executing command... [2019-08-06 02:39:38,508] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list' [2019-08-06 02:39:40,047] 423 DEBUG MainThread ssh.expect :: Output: +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | unlocked | enabled | available | | 2 | compute-0 | worker | unlocked | disabled | offline | | 3 | compute-1 | worker | unlocked | enabled | available | | 4 | controller-1 | controller | unlocked | enabled | available | +----+--------------+-------------+----------------+-------------+--------------+ [2019-08-06 02:49:45,734] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide' [2019-08-06 02:49:46,009] 423 DEBUG MainThread ssh.expect :: Output: NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 90m 192.168.204.174 compute-0 <none> <none> openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 90m 192.168.204.174 compute-0 <none> <none> openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 90m 192.168.204.174 compute-0 <none> <none> [2019-08-06 03:02:08,744] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid' [2019-08-06 03:02:10,193] 423 DEBUG MainThread ssh.expect :: Output: +--------------------------------------+----------+---------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+----------+----------------------------+ | UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +--------------------------------------+----------+---------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+----------+----------------------------+ | 0192e25d-def0-4134-ad62-a64aaf495695 | 200.006 | compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress. | host=compute-0.process=pci-irq-affinity-agent | major | 2019-08-06T02:43:27.705369 | | 4cb4a0ee-f493-420b-a218-20759a112258 | 250.001 | compute-0 Configuration is out-of-date. | host=compute-0 | major | 2019-08-06T02:41:40.375879 | | 9ac05c3b-a79e-4544-877f-720c8056ef5f | 270.001 | Host compute-1 compute services failure, failed to disable nova services | host=compute-1.services=compute | critical | 2019-08-06T02:39:52.177900 | | a2e2ec3c-9490-42fc-9099-bd4427daf5af | 270.001 | Host compute-0 compute services failure, failed to disable nova services | host=compute-0.services=compute | critical | 2019-08-06T02:39:04.766953 | | 2409cab2-28e3-45ca-b0fe-0712c3134366 | 750.002 | Application Apply Failure | k8s_application=stx-openstack | major | 2019-08-03T17:28:23.838877 | +--------------------------------------+----------+---------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+----------+----------------------------+ controller-1:~$ [2019-08-06 03:02:10,194] 301 DEBUG MainThread ssh.send :: Send 'echo $?' [2019-08-06 03:02:10,297] 423 DEBUG MainThread ssh.expect :: Output: 0 controller-1:~$ [2019-08-06 03:02:10,297] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_63_66 [2019-08-06 03:02:10,297] 466 DEBUG MainThread ssh.exec_cmd:: Executing command... [2019-08-06 03:02:10,297] 301 DEBUG MainThread ssh.send :: Send 'kubectl get pod --all-namespaces --field-selector=status.phase!=Running,status.phase!=Succeeded -o=wide' [2019-08-06 03:02:10,528] 423 DEBUG MainThread ssh.expect :: Output: NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openstack libvirt-libvirt-default-sdpz2 0/1 Init:0/3 1 102m 192.168.204.174 compute-0 <none> <none> openstack neutron-dhcp-agent-compute-0-5621f953-jgq5b 0/1 Init:0/1 1 102m 192.168.204.174 compute-0 <none> <none> openstack neutron-l3-agent-compute-0-5621f953-fgcsl 0/1 Init:0/1 1 102m 192.168.204.174 compute-0 <none> <none> openstack neutron-metadata-agent-compute-0-5621f953-j62ts 0/1 Init:0/2 1 102m 192.168.204.174 compute-0 <none> <none> openstack neutron-ovs-agent-compute-0-5621f953-mvwck 0/1 Init:0/3 1 102m 192.168.204.174 compute-0 <none> <none> openstack neutron-sriov-agent-compute-0-5621f953-rbfs8 0/1 Init:0/2 1 102m 192.168.204.174 compute-0 <none> <none> openstack nova-compute-compute-0-5621f953-6rpfx 0/2 Init:0/6 1 102m 192.168.204.174 compute-0 <none> <none> openstack nova-service-cleaner-1565060400-kkg26 0/1 Init:0/1 0 2m4s 172.16.166.255 controller-1 <none> <none> controller-1:~$ [2019-08-06 03:02:10,528] 301 DEBUG MainThread ssh.send :: Send 'echo $?' [2019-08-06 03:02:10,631] 423 DEBUG MainThread ssh.expect :: Output: 0 controller-1:~$ [2019-08-06 03:02:10,632] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_63_66 [2019-08-06 03:02:10,632] 466 DEBUG MainThread ssh.exec_cmd:: Executing command... [2019-08-06 03:02:10,632] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list' [2019-08-06 03:02:12,153] 423 DEBUG MainThread ssh.expect :: Output: +---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+ | application | version | manifest name | manifest file | status | progress | +---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+ | platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed | | stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | apply-failed | operation aborted, check logs for detail | +---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+ controller-1:~$ [2019-08-06 03:02:12,153] 301 DEBUG MainThread ssh.send :: Send 'echo $?' [2019-08-06 03:02:12,256] 423 DEBUG MainThread ssh.expect :: Output: 0 controller-1:~$ [2019-08-06 03:02:12,258] 266 DEBUG MainThread conftest.testcase_log:: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Test steps started for: testcases/functional/mtc/test_multi_node_failure_avoidance.py::test_multi_node_failure_avoidance[300-5] [2019-08-06 03:02:12,258] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_63_66 [2019-08-06 03:02:12,259] 466 DEBUG MainThread ssh.exec_cmd:: Executing command... [2019-08-06 03:02:12,259] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list' [2019-08-06 03:02:13,793] 423 DEBUG MainThread ssh.expect :: Output: +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | unlocked | enabled | available | | 2 | compute-0 | worker | unlocked | enabled | degraded | | 3 | compute-1 | worker | unlocked | enabled | available | | 4 | controller-1 | controller | unlocked | enabled | available | +----+--------------+-------------+----------------+-------------+--------------+ Test Activity ------------- MTC Regression Testing
2019-08-08 19:54:38 Frank Miller tags stx.2.0 stx.containers
2019-08-08 19:54:44 Frank Miller starlingx: status New Triaged
2019-08-08 19:54:52 Frank Miller starlingx: importance Undecided Medium
2019-08-08 19:55:08 Frank Miller starlingx: assignee Jim Gauld (jgauld)
2019-08-23 18:47:44 Ghada Khalil tags stx.2.0 stx.containers stx.3.0 stx.containers
2019-09-09 02:33:35 Yang Liu tags stx.3.0 stx.containers stx.3.0 stx.containers stx.retestneeded
2019-12-13 17:56:36 Frank Miller tags stx.3.0 stx.containers stx.retestneeded stx.4.0 stx.containers stx.retestneeded
2020-04-28 13:13:49 Bill Zvonar bug added subscriber Bill Zvonar
2020-05-25 21:40:29 Frank Miller tags stx.4.0 stx.containers stx.retestneeded stx.4.0 stx.distro.openstack stx.retestneeded
2020-06-03 02:09:34 Frank Miller marked as duplicate 1839160