Instance reach ERROR status during resizing with "Conflict updating instance"
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Mirantis OpenStack | Status tracked in 10.0.x | |||||
10.0.x |
Fix Committed
|
Medium
|
Ivan Udovichenko | |||
9.x |
Fix Released
|
Medium
|
Ivan Udovichenko |
Bug Description
Detailed bug description:
During resizing nova instance it became to ERROR status with
Conflict updating instance b07417fc-
Steps to reproduce:
1. Set use_cow_images=0 value in nova config and restart nova-compute services on all computes
1. Create ubuntu image from ubuntu trusty qcow image
2. Create 2 flavors:
name=
and
name=
3. Boot instance from image from step1 with 'test-eph' flavor
4. Resize instance to flavor 'test-eph-large' with nova API ( instance.
5. Wait until instance reach VERIFY_RESIZE status
Expected results:
All steps are passed.
Actual result:
Instance reach ERROR status:
Conflict updating instance b07417fc-
File "/usr/lib/
return function(self, context, *args, **kwargs)
File "/usr/lib/
instance.
File "/usr/lib/
ctxt, self, fn.__name__, args, kwargs)
File "/usr/lib/
objmethod=
File "/usr/lib/
retry=self.retry)
File "/usr/lib/
timeout=timeout, retry=retry)
File "/usr/lib/
retry=retry)
File "/usr/lib/
raise result
Reproducibility:
Time to time
Description of the environment:
- Versions of components: MOS iso 9.0 builds #424 #443
- Network model: Neutron VLAN
Changed in mos: | |
status: | Won't Fix → Confirmed |
milestone: | 9.0 → 9.0-updates |
tags: | added: on-verification |
This is weird: the resize task fails because it finds instance in unexpected state and the reason for that is that it was explicitly reset by nova-compute on start (as a part of cleanup for unfinished migrations / resizes):
2016-06-06 12:44:07.650 18216 DEBUG nova.compute. manager [req-f182f0f6- 29f2-4a91- 9f75-44f51af1f9 89 - - - - -] [instance: b07417fc- da95-4b2f- ae4e-01a916f706 6f] Instance in transitional state resize_prep at start-up clearing task state _init_instance /usr/lib/ python2. 7/dist- packages/ nova/compute/ manager. py:1028
which effectively means nova-compute was restarted in between of two events:
1) nova-api receiving a REST request to resize and instance
and
2) nova-compute actually receiving an RPC call to fulfil the request on the compute node.
Unfortunately the logs are very scarce on why it happens, we can only see that the process was restarted:
2016-06-06 12:44:07.117 18216 WARNING oslo_reports. guru_meditation _report [-] Guru mediation now registers SIGUSR1 and SIGUSR2 by default for b
ackward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
The previous entry suggests it was not terminated properly:
2016-06-06 12:44:03.463 6303 DEBUG oslo_service. periodic_ task [req-e7eaf8d3- bc19-4889- a195-e3b9066f37 3b - - - - -] Running periodic task Comput _check_ instance_ build_time run_periodic_tasks /usr/lib/ python2. 7/dist- packages/ oslo_service/ periodic_ task.py: 215
eManager.
as there is no mention if received signals we handle, which are SIGHUP, SIGINT and SIGTERM
There are no upstart logs in the snapshot, so we can't say for sure.
Did you by any chance kill the nova-compute process with SIGKILL? Please ping us on Slack if this is reproduced again and you have an environment for live debugging.