instance remains on the same host after host reboot

Bug #1903968 reported by George Postolache
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Low
Unassigned

Bug Description

Brief Description
-----------------
  After rebooting the host to evacuate instances from it the instances recover to active
state, but they sometimes remain on the same host.

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
1. Create instance(s) on a host
2. reboot host (sudo reboot -f)
3. Waiting for host to reach state(s): {'availability': ['offline', 'failed']}
3. Wait for instances to reach ERROR or REBUILD state
4. Check instances are in Active state and moved to other host

Expected Behavior
------------------
instances get back to active state and are moved the another host

Actual Behavior
----------------
instances remain on the same host that was rebooted

Reproducibility
---------------
<Intermittent>

System Configuration
--------------------
<Two node system, Multi-node system>

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="20.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20201110T000235Z"

JOB="STX_build_layer_flock_master_master"
<email address hidden>"
BUILD_NUMBER="310"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2020-11-10 00:02:35 +0000"

FLOCK_OS="centos"
FLOCK_JOB="STX_build_layer_flock_master_master"
<email address hidden>"
FLOCK_BUILD_NUMBER="310"
FLOCK_BUILD_HOST="starlingx_mirror"
FLOCK_BUILD_DATE="2020-11-10 00:02:35 +0000"

DISTRO_OS="centos"
DISTRO_JOB="STX_build_layer_distro_master_master"
<email address hidden>"
DISTRO_BUILD_NUMBER="311"
DISTRO_BUILD_HOST="starlingx_mirror"
DISTRO_BUILD_DATE="2020-11-05 02:30:58 +0000"

COMPILER_OS="centos"
COMPILER_JOB="STX_build_layer_compiler_master_master"
<email address hidden>"
COMPILER_BUILD_NUMBER="331"
COMPILER_BUILD_HOST="starlingx_mirror"
COMPILER_BUILD_DATE="2020-10-28 01:30:01 +0000"

Test Activity
-------------
[Sanity]

Revision history for this message
George Postolache (gpostola) wrote :
Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
George Postolache (gpostola) wrote :
Download full text (7.0 KiB)

Adding some log snips with timestamps, maybe it will be usefull

server show fa265d0a-f38d-49f0-8b7d-1e0964f4c8df'
[2020-11-11 23:22:02,479] 427 DEBUG MainThread ssh.expect :: Output:
+-------------------------------------+----------------------------------------------------------+
| Field | Value |
+-------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | controller-0 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | controller-0 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000005 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2020-11-11T14:02:33.000000 |

server show cab87397-fdc5-4a86-a66a-6597bd8d0768'
[2020-11-11 23:22:04,721] 427 DEBUG MainThread ssh.expect :: Output:
+-------------------------------------+----------------------------------------------------------+
| Field | Value |
+-------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | controller-1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | controller-1 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000007 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2020-11-11T14:13:20.000000 |

[2020-11-11 23:22:14,781] 305 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'
[2020-11-11 23:22:14,893] 190 INFO MainThread host_helper.reboot_hosts:: Active controller reboot started.

server list'
[2020-11-11 23:24:49,821] 427 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+------------+--------+-------------------------+----------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------...

Read more...

Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi,

Please refer to the same old LP below for detail
https://bugs.launchpad.net/starlingx/+bug/1857462
VM failed to recover after VM host reboot.
It is an expected behavior for VIM when
The host enabled again from offline, on which VM evacuation will be blocked, instead related
VM will reboot on the host.
Just need to wait more time after reboot test to check if all VM are
resumed.

Thanks!
Zhipeng

Changed in starlingx:
status: New → Confirmed
Revision history for this message
Austin Sun (sunausti) wrote :

this is expect behavior. close it.

Changed in starlingx:
status: Confirmed → Invalid
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers