Instance status is not updated when compute machine is disconnected

Bug #1291311 reported by RITESH SINGH
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Undecided
Unassigned

Bug Description

Status of Instance ( VM ) is not updated when compute machine is disconnected from the set up. Also, deletion of VM is not possible .

Set up :-

2 compute nodes ( Compute-1 and Compute-2)

root@controller:~# nova list
+--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
| aa73d8e6-f7e9-47d1-a590-06e3c134d33b | THIRD_VM | SHUTOFF | deleting | Shutdown | INT_NET=50.50.1.2, 172.18.7.18
| e809191d-fa66-45f6-84a7-6f1acfe460c8 | UBUNTU_VM | ACTIVE | None | Running | INT_NET=50.50.1.4, 172.18.7.20 |
+--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+

COMPUTE-1 hosting ====== UBUNTU_VM
COMPUTE-2 hosting ====== THIRD_VM

Disconnected COMPUTE-2 from the set up ( power shut down) and then tried to delete THIRD_VM but its showing following error

root@controller:~# nova delete aa73d8e6-f7e9-47d1-a590-06e3c134d33b

The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-6931f510-033b-4c1d-a367-390a0a4421eb)
ERROR: Unable to delete any of the specified servers.

I have also restarted the controller machine but there is no change in status.of VM its still showing as deleting.

Tags: compute
affects: horizon → nova
Revision history for this message
Thang Pham (thang-pham) wrote :

Have you tried to reset the state of the instance, i.e. nova reset-state [--active] <server>? This will get you back to a usable state to try delete again. But something should be done so that the instance state is left in ERROR state instead of DELETING state.

Tracy Jones (tjones-i)
tags: added: compute
Revision history for this message
RITESH SINGH (ritesh-singh-aricent) wrote : Re: [Bug 1291311] Re: Instance status is not updated when compute machine is disconnected

Hi,

Thanks for the response.

Changing the state, is of no use.
I can change the state but could not delete it,

regards
Ritesh Singh

On Wed, Mar 19, 2014 at 10:03 PM, Tracy Jones <email address hidden> wrote:

> ** Tags added: compute
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1291311
>
> Title:
> Instance status is not updated when compute machine is disconnected
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> Status of Instance ( VM ) is not updated when compute machine is
> disconnected from the set up. Also, deletion of VM is not possible .
>
> Set up :-
>
> 2 compute nodes ( Compute-1 and Compute-2)
>
> root@controller:~# nova list
>
> +--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
> | ID | Name | Status | Task
> State | Power State | Networks |
>
> +--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
> | aa73d8e6-f7e9-47d1-a590-06e3c134d33b | THIRD_VM | SHUTOFF | deleting
> | Shutdown | INT_NET=50.50.1.2, 172.18.7.18
> | e809191d-fa66-45f6-84a7-6f1acfe460c8 | UBUNTU_VM | ACTIVE | None
> | Running | INT_NET=50.50.1.4, 172.18.7.20 |
>
> +--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
>
> COMPUTE-1 hosting ====== UBUNTU_VM
> COMPUTE-2 hosting ====== THIRD_VM
>
> Disconnected COMPUTE-2 from the set up ( power shut down) and then
> tried to delete THIRD_VM but its showing following error
>
> root@controller:~# nova delete aa73d8e6-f7e9-47d1-a590-06e3c134d33b
>
> The server has either erred or is incapable of performing the requested
> operation. (HTTP 500) (Request-ID: req-6931f510-033b-4c1d-a367-390a0a4421eb)
> ERROR: Unable to delete any of the specified servers.
>
> I have also restarted the controller machine but there is no change in
> status.of VM its still showing as deleting.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1291311/+subscriptions
>

Revision history for this message
melanie witt (melwitt) wrote :

First, the VM can't be deleted if the compute host it's on is down. Are you saying you powered the compute host back up and then still couldn't delete the VM?

Did you examine the logs on the controller machine to see the detail that results in the 500 being propagated up?

Changed in nova:
status: New → Incomplete
Revision history for this message
Thang Pham (thang-pham) wrote :

I tried to reproduce this bug using the steps outlined above:

1. Have 2 compute nodes (compute1 & compute 2).
2. Create an instance (test2) on compute2.
3. Shutdown compute2 (the equivalent would be just to stop all the nova services, instead of a full shutdown)
4. Delete instance (test2) from the controller.

The power state is stuck at "Deleting" and the status eventually becomes "Error". The logs would show:

ERROR nova.api.openstack [req-5df6b4d6-2fd1-4a26-b74e-d9a470cf147f admin demo] Caught error: Timed out waiting for a reply to message
...
DEBUG nova.api.openstack.wsgi [req-03f0681e-58aa-4199-b865-3d6331ae2df6 admin demo] Returning 500 to user: The server has either erred or is incapable of performing the requested operation. __call__ /opt/stack/nova/nova/api/openstack/wsgi.py:1215

However, if you power back on the node (compute2) and try delete again, the instance would be properly deleted and removed from the database. Note that you have to power back on the node (compute2).

This was reproduced using devstack with the master branch.

Revision history for this message
RITESH SINGH (ritesh-singh-aricent) wrote :

Hi,

I am not sure about the fact that if we add compute node again , instance can be deleted, on my setup.
Actually my compute node got crashed and i need to format that.

But I think there should be provision on controller to remove such instances and there resources.

regards
ritesh Singh

Revision history for this message
melanie witt (melwitt) wrote :

Thanks for the information.

Do you find the quota used by the instance on the crashed compute node isn't returned to the user after the instance delete request?

Revision history for this message
RITESH SINGH (ritesh-singh-aricent) wrote :

Hi ,

Not sure about the exact quota bieng used by the instance.
But yes, I can use the floating IP for any new instance , which was
initially assigned to the deleted instance.

regards
Ritesh Singh

On Wed, Mar 26, 2014 at 11:45 PM, Melanie Witt
<email address hidden>wrote:

> Thanks for the information.
>
> Do you find the quota used by the instance on the crashed compute node
> isn't returned to the user after the instance delete request?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1291311
>
> Title:
> Instance status is not updated when compute machine is disconnected
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> Status of Instance ( VM ) is not updated when compute machine is
> disconnected from the set up. Also, deletion of VM is not possible .
>
> Set up :-
>
> 2 compute nodes ( Compute-1 and Compute-2)
>
> root@controller:~# nova list
>
> +--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
> | ID | Name | Status | Task
> State | Power State | Networks |
>
> +--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
> | aa73d8e6-f7e9-47d1-a590-06e3c134d33b | THIRD_VM | SHUTOFF | deleting
> | Shutdown | INT_NET=50.50.1.2, 172.18.7.18
> | e809191d-fa66-45f6-84a7-6f1acfe460c8 | UBUNTU_VM | ACTIVE | None
> | Running | INT_NET=50.50.1.4, 172.18.7.20 |
>
> +--------------------------------------+-----------+---------+------------+-------------+-----------------------------------------------------+
>
> COMPUTE-1 hosting ====== UBUNTU_VM
> COMPUTE-2 hosting ====== THIRD_VM
>
> Disconnected COMPUTE-2 from the set up ( power shut down) and then
> tried to delete THIRD_VM but its showing following error
>
> root@controller:~# nova delete aa73d8e6-f7e9-47d1-a590-06e3c134d33b
>
> The server has either erred or is incapable of performing the requested
> operation. (HTTP 500) (Request-ID: req-6931f510-033b-4c1d-a367-390a0a4421eb)
> ERROR: Unable to delete any of the specified servers.
>
> I have also restarted the controller machine but there is no change in
> status.of VM its still showing as deleting.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1291311/+subscriptions
>

Revision history for this message
melanie witt (melwitt) wrote :

Okay. Well, it seems like maybe there isn't a bug here, other than the VM state being left as "deleting" and you have to issue reset state to take it out of "deleting."

The VM itself can't be deleted unless the compute host comes back, and there is a config setting where periodically instances that were requested to be deleted will be reaped, see:

https://ask.openstack.org/en/question/1905/instance-marked-as-deleted-but-still-present-on-host/

In the nova database, in the instances table, I think you should see the instance column deleted=1 which means it was marked as deleted. When you say the resources aren't removed, what do you mean? You still see the instance when you 'nova list' or you don't see the quota returned or?

Revision history for this message
RITESH SINGH (ritesh-singh-aricent) wrote :

Sorry for late reply as I was out of town.

yes , I can see the instance in Nova list. Though all the resources are free now, including the IPs.

1.
After resetting the state of VM , I can see NOSTATE in output of Nova list but on horizon , state is "deleting."

2.
Since I have mentioned that my compute machine got crashed , does that I mean that HORIZON will always show an unwanted VM on the instance tab.

Revision history for this message
melanie witt (melwitt) wrote :

Thanks for the added detail. From what I can tell so far, it looks like this is a scenario for operator recovery.

Here is the documentation I found for total compute host failure:

http://docs.openstack.org/trunk/openstack-ops/content/maintenance.html#totle_compute_node_failure

This is saying that if you have backed the VMs with shared storage, you can update the nova database to change where the VMs are hosted and issue them reboot commands to bring them up on a new compute host. If you're not using shared storage, you're in a situation where you simply wish to delete them, forget the crashed node, and move on.

In this case, I think you would update the nova database to set those instances to deleted=1 so they no longer show in the nova list.

I'm going to see if I can get more comment on this from other developers.

Revision history for this message
melanie witt (melwitt) wrote :

Since my last update, I've learned more about what should happen when a compute host fails. When you request to delete an instance on a compute host that is down, a "local delete" should go through i.e. the instance should disappear from 'nova list' and you should get the ip and quota back.

Ritesh, are you still having this issue? That 'nova list' still shows the instance after being deleted? You shouldn't need to bring the failed compute host back up. What openstack release are you running?

Revision history for this message
Sean Dague (sdague) wrote :

No follow up on the incomplete status ask

Changed in nova:
status: Incomplete → Opinion
Revision history for this message
RITESH SINGH (ritesh-singh-aricent) wrote :

Apologies , as I was out of a dedicated setup to test the same and provide my feedback.

Hi Melwittt,

Yes , I am still facing that issue.
I am using ice house currently .I will update the same soon for juno or Kilo.

No handling for such scenarios is present till now.

I suggest

1. there should be a handling for such scenario for deletion of VMs , where there is no communication with compute host.
2. Proper message should be displayed
3. force delete should be an option

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.