Instance remains in scheduling state if Compute server is down

Bug #956960 reported by Unmesh Gurjar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Low
Unmesh Gurjar

Bug Description

Scenario: If the Compute service is down and a new instance is launched, the instance state remains in 'building' state.
Verify the response in the database or dashboard.

Expected Response: The vm_state of instance must be 'error'.
Actual Response: The vm_state remains in the 'building' state.

Branch: master

The Nova API must check if the Compute service is up (if there are multiple Compute hosts and one of them is down, still the service should be considered down). And if Compute service is down, update the instance status in database to 'error'.

Tags: ntt
Changed in nova:
assignee: nobody → Unmesh Gurjar (unmesh-gurjar)
description: updated
Revision history for this message
Johannes Erdfelt (johannes.erdfelt) wrote :

The compute nodes already update the database with a heartbeat that the scheduler uses to determine if the compute service is running.

How did you run into this situation?

Revision history for this message
Unmesh Gurjar (unmesh-gurjar) wrote :

If the Compute node goes down and immediately a Create Server request is hit, this scenario is reproduced. In my case, I manually stopped Compute node and then initiated a Create Server request.

Revision history for this message
Mandar Vaze (mandarvaze) wrote :
Revision history for this message
Sean Dague (sdague) wrote :

Any progress so far?

 -your friendly neighborhood bug triager

Changed in nova:
status: New → Confirmed
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → Low
Revision history for this message
Unmesh Gurjar (unmesh-gurjar) wrote :

The issue can be resolved if there is at-least one other Compute server having the _check_instance_build_time periodic task enabled (it will mark the instance to ERROR). There are two possible scenarios after this:
1. Host compute server comes up: It will pick the request from RabbitMQ and start the instance.
2. User deletes the instance and then Host compute server comes up: Since the instance is marked as deleted, the Compute server does not spawn a new instance and the instance remains in ERROR state (vm_state=ERROR, task_state=deleting, power_state=0, deleted=1). No resources are being consumed by the instance at this point.

Therefore, marking the bug as Invalid now.

Changed in nova:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.