RunInstances fail due to state change of _poll_instance_state()

Bug #744056 reported by Koji Iida
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
justinsb

Bug Description

run_instance sometimes fails.

tested: r894, qemu.

If _poll_instance_state() is called while spawning instance, _poll_instance_state() will update state of spawning instance and the instance will be removed from DB. This will cause failure of run_instances().

I think that following state changes should not be applied to just spawning instances:

2011-03-28 15:18:18,737 INFO nova.compute.manager [-] Found instance 'instance-00000001' in DB but no VM. Setting state to shutoff.
2011-03-28 15:18:18,738 INFO nova.compute.manager [-] DB/VM state mismatch. Changing state from '0' to '5'

Here are nova-compute's log output:

2011-03-28 15:17:33,337 DEBUG nova.utils [-] Running cmd (subprocess): mkdir -p /home/iida/nova/nova/..//instances/instance-00000001/ from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:17:33,375 INFO nova.virt.libvirt_conn [-] instance instance-00000001: Creating image
2011-03-28 15:17:33,438 DEBUG nova.utils [-] Attempting to grab semaphore "481bc4e3" for method "call_if_not_exists"... from (pid=14228) inner /home/iida/nova/nova/utils.py:594
2011-03-28 15:17:33,616 DEBUG nova.utils [-] Running cmd (subprocess): cp /home/iida/nova/nova/..//instances/_base/481bc4e3 /home/iida/nova/nova/..//instances/instance-00000001/kernel from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:17:33,743 DEBUG nova.utils [-] Attempting to grab semaphore "395a3690" for method "call_if_not_exists"... from (pid=14228) inner /home/iida/nova/nova/utils.py:594
2011-03-28 15:17:34,087 DEBUG nova.utils [-] Running cmd (subprocess): cp /home/iida/nova/nova/..//instances/_base/395a3690 /home/iida/nova/nova/..//instances/instance-00000001/ramdisk from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:17:34,201 DEBUG nova.utils [-] Attempting to grab semaphore "6a4902cb_sm" for method "call_if_not_exists"... from (pid=14228) inner /home/iida/nova/nova/utils.py:594
2011-03-28 15:18:18,200 DEBUG nova.utils [-] Running cmd (subprocess): qemu-img create -f qcow2 -o cluster_size=2M,backing_file=/home/iida/nova/nova/..//instances/_base/6a4902cb_sm /home/iida/nova/nova/..//instances/instance-00000001/disk from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:18,737 INFO nova.compute.manager [-] Found instance 'instance-00000001' in DB but no VM. Setting state to shutoff.
2011-03-28 15:18:18,738 INFO nova.compute.manager [-] DB/VM state mismatch. Changing state from '0' to '5'
2011-03-28 15:18:19,167 INFO nova.virt.libvirt_conn [-] instance instance-00000001: injecting key into image 1783169739
2011-03-28 15:18:19,167 INFO nova.virt.libvirt_conn [-] instance instance-00000001: injecting net into image 1783169739
2011-03-28 15:18:19,177 DEBUG nova.utils [-] Running cmd (subprocess): sudo qemu-nbd -c /dev/nbd15 /home/iida/nova/nova/..//instances/instance-00000001/disk from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:20,799 DEBUG nova.utils [-] Running cmd (subprocess): sudo tune2fs -c 0 -i 0 /dev/nbd15 from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:21,964 DEBUG nova.utils [-] Running cmd (subprocess): sudo mount /dev/nbd15 /tmp/tmpkRFDUp from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,181 DEBUG nova.utils [-] Running cmd (subprocess): sudo mkdir -p /tmp/tmpkRFDUp/root/.ssh from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,341 DEBUG nova.utils [-] Running cmd (subprocess): sudo chown root /tmp/tmpkRFDUp/root/.ssh from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,490 DEBUG nova.utils [-] Running cmd (subprocess): sudo chmod 700 /tmp/tmpkRFDUp/root/.ssh from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,590 DEBUG nova.utils [-] Running cmd (subprocess): sudo tee -a /tmp/tmpkRFDUp/root/.ssh/authorized_keys from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,689 DEBUG nova.utils [-] Running cmd (subprocess): sudo mkdir -p /tmp/tmpkRFDUp/etc/network from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,790 DEBUG nova.utils [-] Running cmd (subprocess): sudo chown root:root /tmp/tmpkRFDUp/etc/network from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,873 DEBUG nova.utils [-] Running cmd (subprocess): sudo chmod 755 /tmp/tmpkRFDUp/etc/network from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:22,950 DEBUG nova.utils [-] Running cmd (subprocess): sudo tee /tmp/tmpkRFDUp/etc/network/interfaces from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:23,060 DEBUG nova.utils [-] Running cmd (subprocess): sudo umount /dev/nbd15 from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:24,190 DEBUG nova.utils [-] Running cmd (subprocess): rmdir /tmp/tmpkRFDUp from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:24,258 DEBUG nova.utils [-] Running cmd (subprocess): sudo qemu-nbd -d /dev/nbd15 from (pid=14228) execute /home/iida/nova/nova/utils.py:150
2011-03-28 15:18:28,116 DEBUG nova.virt.libvirt_conn [-] instance instance-00000001: is running from (pid=14228) spawn /home/iida/nova/nova/virt/libvirt_conn.py:503
2011-03-28 15:18:28,243 ERROR nova.compute.manager [AMJJLSBS8AK32S9IE45S admin admin] Instance '1' failed to spawn. Is virtualization enabled in the BIOS?
(nova.compute.manager): TRACE: Traceback (most recent call last):
(nova.compute.manager): TRACE: File "/home/iida/nova/nova/compute/manager.py", line 244, in run_instance
(nova.compute.manager): TRACE: {'launched_at': now})
(nova.compute.manager): TRACE: File "/home/iida/nova/nova/db/api.py", line 474, in instance_update
(nova.compute.manager): TRACE: return IMPL.instance_update(context, instance_id, values)
(nova.compute.manager): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova.compute.manager): TRACE: return f(*args, **kwargs)
(nova.compute.manager): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 994, in instance_update
(nova.compute.manager): TRACE: instance_ref = instance_get(context, instance_id, session=session)
(nova.compute.manager): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova.compute.manager): TRACE: return f(*args, **kwargs)
(nova.compute.manager): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 848, in instance_get
(nova.compute.manager): TRACE: instance_id)
(nova.compute.manager): TRACE: InstanceNotFound: Instance 1 not found
(nova.compute.manager): TRACE:
2011-03-28 15:18:28,345 ERROR nova [-] Exception during message handling
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/home/iida/nova/nova/rpc.py", line 183, in _receive
(nova): TRACE: rval = node_func(context=ctxt, **node_args)
(nova): TRACE: File "/home/iida/nova/nova/exception.py", line 120, in _wrap
(nova): TRACE: return f(*args, **kw)
(nova): TRACE: File "/home/iida/nova/nova/compute/manager.py", line 251, in run_instance
(nova): TRACE: power_state.SHUTDOWN)
(nova): TRACE: File "/home/iida/nova/nova/db/api.py", line 465, in instance_set_state
(nova): TRACE: return IMPL.instance_set_state(context, instance_id, state, description)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 98, in wrapper
(nova): TRACE: return f(*args, **kwargs)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 987, in instance_set_state
(nova): TRACE: 'state_description': description})
(nova): TRACE: File "/home/iida/nova/nova/db/api.py", line 474, in instance_update
(nova): TRACE: return IMPL.instance_update(context, instance_id, values)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova): TRACE: return f(*args, **kwargs)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 994, in instance_update
(nova): TRACE: instance_ref = instance_get(context, instance_id, session=session)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova): TRACE: return f(*args, **kwargs)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 848, in instance_get
(nova): TRACE: instance_id)
(nova): TRACE: InstanceNotFound: Instance 1 not found
(nova): TRACE:
2011-03-28 15:18:28,613 ERROR nova.virt.libvirt_conn [-] instance instance-00000001: failed to boot
(nova.virt.libvirt_conn): TRACE: Traceback (most recent call last):
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/virt/libvirt_conn.py", line 512, in _wait_for_boot
(nova.virt.libvirt_conn): TRACE: instance['id'], state)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/api.py", line 465, in instance_set_state
(nova.virt.libvirt_conn): TRACE: return IMPL.instance_set_state(context, instance_id, state, description)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 98, in wrapper
(nova.virt.libvirt_conn): TRACE: return f(*args, **kwargs)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 987, in instance_set_state
(nova.virt.libvirt_conn): TRACE: 'state_description': description})
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/api.py", line 474, in instance_update
(nova.virt.libvirt_conn): TRACE: return IMPL.instance_update(context, instance_id, values)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova.virt.libvirt_conn): TRACE: return f(*args, **kwargs)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 994, in instance_update
(nova.virt.libvirt_conn): TRACE: instance_ref = instance_get(context, instance_id, session=session)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova.virt.libvirt_conn): TRACE: return f(*args, **kwargs)
(nova.virt.libvirt_conn): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 848, in instance_get
(nova.virt.libvirt_conn): TRACE: instance_id)
(nova.virt.libvirt_conn): TRACE: InstanceNotFound: Instance 1 not found
(nova.virt.libvirt_conn): TRACE:
2011-03-28 15:18:28,700 ERROR nova [-] in looping call
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/home/iida/nova/nova/utils.py", line 465, in _inner
(nova): TRACE: self.f(*self.args, **self.kw)
(nova): TRACE: File "/home/iida/nova/nova/virt/libvirt_conn.py", line 521, in _wait_for_boot
(nova): TRACE: power_state.SHUTDOWN)
(nova): TRACE: File "/home/iida/nova/nova/db/api.py", line 465, in instance_set_state
(nova): TRACE: return IMPL.instance_set_state(context, instance_id, state, description)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 98, in wrapper
(nova): TRACE: return f(*args, **kwargs)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 987, in instance_set_state
(nova): TRACE: 'state_description': description})
(nova): TRACE: File "/home/iida/nova/nova/db/api.py", line 474, in instance_update
(nova): TRACE: return IMPL.instance_update(context, instance_id, values)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova): TRACE: return f(*args, **kwargs)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 994, in instance_update
(nova): TRACE: instance_ref = instance_get(context, instance_id, session=session)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 109, in wrapper
(nova): TRACE: return f(*args, **kwargs)
(nova): TRACE: File "/home/iida/nova/nova/db/sqlalchemy/api.py", line 848, in instance_get
(nova): TRACE: instance_id)
(nova): TRACE: InstanceNotFound: Instance 1 not found
(nova): TRACE:

Tags: regression

Related branches

Revision history for this message
justinsb (justin-fathomdb) wrote :

Thanks for the great bug report. This sounds like a bug I introduced, so I will work on fixing it.

Changed in nova:
assignee: nobody → justinsb (justin-fathomdb)
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
status: New → In Progress
tags: added: regression
Thierry Carrez (ttx)
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.