VM was failed to evacuate after compute reboot

Bug #1806415 reported by Anujeyan Manokeran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Bug Description : VM was failed to evacuate after compute reboot. VM uuid 4feb7863-9879-41bc-a922-e3b450ad2568 was launched in compute-2 . When compute 2 rebooted it was not evacuated compute-0 or compute-1.

ead ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne list --all-tenants'
+--------------------------------------+------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
| 4feb7863-9879-41bc-a922-e3b450ad2568 | tenant2-opensuse_12_image-34 | 0e5566abd99c468ab6dfe7b01a551a66 | ACTIVE | - | Running | tenant2-net1=172.18.1.137; tenant2-mgmt-net=192.168.247.39 |
+--------------------------------------+------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
controller-0:~$

'
018-12-01 21:58:13,275] 263 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2018-12-01 21:58:14,760] 389 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | compute-0 | compute | unlocked | enabled | available |
| 4 | compute-1 | compute | unlocked | enabled | available |
| 5 | compute-2 | compute | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

Rebooted

2018-12-01 21:58:58,655] 426 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2018-12-01 21:58:58,655] 263 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2018-12-01 21:59:00,083] 389 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | compute-0 | compute | unlocked | enabled | available |
| 4 | compute-1 | compute | unlocked | enabled | available |
| 5 | compute-2 | compute | unlocked | disabled | offline |
+----+--------------+-------------+----------------+-------------+--------------+
controller-0:~$
[2018-12-01 21:59:00,084] 263 DEBUG MainThread ssh.send :: Send 'echo $?'

controller-0:~$

VM was not evacuated poweroff state.
 [2018-12-01 22:10:05,572] 263 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne list --all-tenants'
[2018-12-01 22:10:07,681] 389 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+-------------+------------------------------------------------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+-------------+------------------------------------------------------------+
| b6d4b21b-7c9f-4592-af2e-36a6ad83764c | tenant2-rhel_6_image-35 | 0e5566abd99c468ab6dfe7b01a551a66 | ACTIVE | powering-off | Running | tenant2-net1=172.18.1.138; tenant2-mgmt-net=192.168.247.35 |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+-------------+------------------------------------------------------------+
controller-0:~$

Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne list --all-tenants'
[2018-12-01 22:01:25,700] 389 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
| 4feb7863-9879-41bc-a922-e3b450ad2568 | tenant2-opensuse_12_image-34 | 0e5566abd99c468ab6dfe7b01a551a66 | ERROR | - | Running | tenant2-net1=172.18.1.137; tenant2-mgmt-net=192.168.247.39 |
+--------------------------------------+------------------------------+----------------------------------+--------+------------+-------------+------------------------------------------------------------+
controller-0:~$

Severity
--------
Major

Steps to Reproduce
------------------
1. Launch VMs 4feb7863-9879-41bc-a922-e3b450ad2568
2. Reboot compute-2
3. Verify evacuate.

Expected Behavior
------------------
No alarms.

Actual Behavior
----------------
As per description

Reproducibility
---------------
100% reproduced

System Configuration
--------------------
2+2+storage configuration

Branch/Pull Time/Commit
-----------------------
StarlingX_Upstream_build release branch build as of : 2018-11-30_20-21-27

Timestamp/Logs
--------------
[2018-12-01 21:58:11,205]

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Release gating - introduced by the LVM Local Storage patch elimination code changes

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Jim Gauld (jgauld)
tags: added: stx.2019.03 stx.nfv
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
Abraham Arce (xe1gyq) wrote :

Are these the right steps to follow? execution has been done in both a bare metal and virtual deployment.

1. Launch VMs

   $ openstack server create --flavor m1.tiny --image cirros --nic net-id=net vm2
   $ openstack server create --flavor m1.tiny --image cirros --nic net-id=net vm2

   $ controller-0:~$ openstack server show vm1 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-0
   $ controller-0:~$ openstack server show vm2 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-1

2. Reboot compute-1

   Is the reboot command in the compute a right approach?
   $ sudo reboot

3. Verify evacuate.

   "Rebuilding" message was show under Horizon, instance vm2 was migrated to compute-0

   controller-0:~$ openstack server show vm1 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-0
   controller-0:~$ openstack server show vm2 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-0

   Compute-0 is recovering, from UI a "Graceful Recovery Wait"

   $ system host-list
   | 6 | compute-1 | worker | unlocked | disabled | offline |

The above steps were executed under a Bare Metal Dedicated Storage 2 + 2 + 2:

  BUILD_TARGET="Host Installer"
  BUILD_TYPE="Formal"
  BUILD_ID="20190523T013000Z"

Same steps were executed under a Virtual deployment

1. Launch VMs

   $ openstack server create --flavor m1.tiny --image cirros --nic net-id=net vm2
   $ openstack server create --flavor m1.tiny --image cirros --nic net-id=net vm2

   controller-0:~$ openstack server show vm1 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-1
   controller-0:~$ openstack server show vm2 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-0

2. Reboot compute-1

   Is the reboot command in the compute a right approach?
   $ sudo reboot

3. Verify evacuate.

   "Rebuilding" message was show under Horizon, instance vm2 was migrated to compute-1

   controller-0:~$ openstack server show vm1 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-1
   controller-0:~$ openstack server show vm2 | grep compute
   | OS-EXT-SRV-ATTR:host | compute-1

The above steps were executed under a Virutal Dedicated Storage 2 + 2 + 2:

  BUILD_TARGET="Host Installer"
  BUILD_TYPE="Formal"
  BUILD_ID="20190604T144018Z"

Revision history for this message
Frank Miller (sensfan22) wrote :

Based on Abraham's re-test this issue no longer occurs. Expectation is that a code change was done early in 2019 to address this issue. Marking this LP as fix released and request the test team re-test.

Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

It is now fixed . I do not see the issue 20190708T233000Z

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.