vm rebuild fails but confusing state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Undecided
|
Stephen Finucane |
Bug Description
Description
===========
If nova-compute crashes on the host and I am trying to rebuild the instance on this host, it will stuck in rebulding which is reasonable because no one consumes the message from MQ. However, if nova-compute on this host is started then, the instance gets confusing state.
I am not sure it is designed like this or a bug.
Steps to reproduce
==================
(1) boot an instance
$nova boot --image cirros --flavor mini --nic net-id=
$nova list
+------
| ID | Name | Host | Status | Task State | Power State | Networks |
+------
| 5c6c8913-
+------
(2) stop nova-compute on compute1
$nova service-list|grep compute1
| 985b4f89-
(3) rebuild the instance
$nova rebuild y cir
$nova list
+------
| ID | Name | Host | Status | Task State | Power State | Networks |
+------
| 5c6c8913-
+------
$ rabbitmqctl list_queues
Listing queues ...
...
compute.compute1 1
(4) start nova-compute on compute1
first:
$ nova list
+------
| ID | Name | Host | Status | Task State | Power State | Networks |
+------
| 5c6c8913-
+------
but soon later it stucks in following state:
$ nova list
+------
| ID | Name | Host | Status | Task State | Power State | Networks |
+------
| 5c6c8913-
+------
$ nova show y
+------
| Property | Value |
+------
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-
| OS-EXT-STS:vm_state | error |
| OS-SRV-
| OS-SRV-
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2018-01-
| description | - |
| flavor:disk | 1 |
| flavor:ephemeral | 0 |
| flavor:extra_specs | {} |
| flavor:
| flavor:ram | 64 |
| flavor:swap | 0 |
| flavor:vcpus | 1 |
| hostId | a7837285ad98259
| host_status | UP |
| id | 5c6c8913-
| image | cir (14440220-
| key_name | - |
| locked | False |
| metadata | {} |
| name | y |
| os-extended-
| progress | 0 |
| security_groups | default |
| status | REBUILD |
| tags | [] |
| tenant_id | 5673331fd93740f
| updated | 2018-01-
| user_id | 59a94426c0404b5
| vxlan-l network | 100.53.0.8 |
+------
Expected result
===============
Instance status should become error or the rebuild operarion should continue to be finished.
Actual result
=============
Instance status stucks in REBUILD and task_state stucks in rebuilding.
Environment
===========
nova-api 2:16.0.
nova-common 2:16.0.
nova-conductor 2:16.0.
nova-consoleauth 2:16.0.
nova-novncproxy 2:16.0.
nova-placement-api 2:16.0.
nova-scheduler 2:16.0.
python-nova 2:16.0.
python-novaclient 2:9.1.0-
Libvirt + KVM, CEPH, NEUTRON with linuxbridge
Logs
==============
On compute node:
(after nova-compute started)
2018-01-08 11:20:15.685 2352 DEBUG nova.compute.
(nova-compute picks the message from MQ and continues to rebuild)
2018-01-08 11:20:18.900 2352 INFO nova.compute.
2018-01-08 11:20:19.101 2352 DEBUG nova.notificati
2018-01-08 11:20:19.108 2352 DEBUG nova.compute.
2018-01-08 11:20:19.242 2352 DEBUG nova.compute.utils [req-d635de5c-
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
Traceback (most recent call last):
File "/usr/lib/
return getattr(target, method)(*args, **kwargs)
File "/usr/lib/
return fn(self, *args, **kwargs)
File "/usr/lib/
columns_
File "/usr/lib/
expected=
File "/usr/lib/
return f(*args, **kwargs)
File "/usr/lib/
ectxt.value = e.inner_exc
File "/usr/lib/
self.
File "/usr/lib/
six.
File "/usr/lib/
return f(*args, **kwargs)
File "/usr/lib/
return f(context, *args, **kwargs)
File "/usr/lib/
context, instance_uuid, values, expected, original=
File "/usr/lib/
raise exc(**exc_props)
UnexpectedTaskS
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
2018-01-08 11:20:19.274 2352 ERROR nova.compute.
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
summary: |
- vm rebuild fail but confusing state + vm rebuild fails but confusing state |
Changed in nova: | |
assignee: | xulei (605423512-j) → Stephen Finucane (stephenfinucane) |
Confusing state is caused by nova-compute. When nova-compute restart ,it will set all instances with task_state[ rebuilding] to error. And then nova-compute handle rpc to rebuild, and raise this error. I think this is proper because nobody know instance's true state with state rebuilding in db. I will optimize the process of exception handling.