Rebuild instance is stuck in rebuilding state when hosting Compute is powered off

Bug #1801714 reported by Hu Zhou
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Akhil Gudise

Bug Description

Description
===========
When the instance X's hosting Compute H is down, ry to rebuild instance is stuck is "REBUILD" status forever.

Steps to reproduce
==================
* Create instance X on Compute H
* Power off Compute H
* Rebuild instance X (with same image)

$ openstack server rebuild X

Expected result
===============
The instance X should fails to rebuild on the same host with proper error message.

Actual result
=============
The instance is in "REBUILD" state forever.

Environment
===========
1. Queens release used.

   $ rpm -qa | grep nova-compute
     openstack-nova-compute-17.0.5-1

2. KVM hypervisor
   qemu-kvm-common-ev-2.10.0-21
   libvirt-daemon-kvm-3.9.0-14.1

2. LVM used as storage backend on Compute host.
   lvm2-2.02.177-4

3. Which networking type did you use?
   Neutron with OpenVSwitch

Tags: rebuild
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I can reproduce the reported behavior on recent master in a devstack. I also waited after the compute service stop until the nova controller detected that the compute is down. Still rebuild is accepted and the the instance is stuck in REBUILD state even after I started the compute service back up.

A possible workaround is to use nova reset-state --active to push the state of the server back to ACTIVE state.

When the compute is started up it logged an ERROR for the stuck instance so this is definitely bug.[1]

[1] http://paste.openstack.org/show/793758/

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Akhil Gudise (akhil-g)
Changed in nova:
assignee: nobody → Akhil Gudise (akhil-g)
Revision history for this message
Akhil Gudise (akhil-g) wrote :

I was also able to reproduce this bug in the latest version. I would like to add my views on this bug since the nova controller can detect that the compute node is down maybe we can add a method to check the status of the compute nodes, that get invoked for all the requests made related to compute services.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

As we discussed on the nova weekly meeting I'm OK with the idea to introduce a service up check in the rebuild code path.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/749531

Changed in nova:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.