While trying to get my 2nd compute machine up and running, I noticed that it keeps crashing if an instance failed to launch because of RPC communication timeouts.
When the compute restarts and tries to do something about this instance, it crashes with this log:
012-11-08 11:25:44 AUDIT nova.service [-] Starting compute node (version 2012.2-LOCALBRANCH:LOCALREVISION)
2012-11-08 11:25:44 DEBUG nova.virt.libvirt.driver [-] Connecting to libvirt: qemu:///system from (pid=1638) _get_connection /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py:340
2012-11-08 11:25:45 DEBUG nova.utils [req-bdd9dcc1-9849-4b4f-a7f5-ba8fd84cc843 None None] backend <module 'nova.db.sqlalchemy.api' from '/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.pyc'> from (pid=1638) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:494
2012-11-08 11:25:47 DEBUG nova.compute.manager [req-bdd9dcc1-9849-4b4f-a7f5-ba8fd84cc843 None None] [instance: 2544284a-8abf-4d71-ba5e-2366ad3369c4] Checking state from (pid=1638) _get_power_state /usr/lib/python2.7/dist-packages/nova/compute/manager.py:334
2012-11-08 11:25:47 DEBUG nova.compute.manager [req-bdd9dcc1-9849-4b4f-a7f5-ba8fd84cc843 None None] [instance: 2544284a-8abf-4d71-ba5e-2366ad3369c4] Current state is 0, state in DB is 0. from (pid=1638) init_host /usr/lib/python2.7/dist-packages/nova/compute/manager.py:288
2012-11-08 11:25:47 CRITICAL nova [-] list index out of range
2012-11-08 11:25:47 TRACE nova Traceback (most recent call last):
2012-11-08 11:25:47 TRACE nova File "/usr/bin/nova-compute", line 48, in <module>
2012-11-08 11:25:47 TRACE nova service.wait()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 659, in wait
2012-11-08 11:25:47 TRACE nova _launcher.wait()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 192, in wait
2012-11-08 11:25:47 TRACE nova super(ServiceLauncher, self).wait()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 162, in wait
2012-11-08 11:25:47 TRACE nova service.wait()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
2012-11-08 11:25:47 TRACE nova return self._exit_event.wait()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2012-11-08 11:25:47 TRACE nova return hubs.get_hub().switch()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
2012-11-08 11:25:47 TRACE nova return self.greenlet.switch()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
2012-11-08 11:25:47 TRACE nova result = function(*args, **kwargs)
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 132, in run_server
2012-11-08 11:25:47 TRACE nova server.start()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 398, in start
2012-11-08 11:25:47 TRACE nova self.manager.init_host()
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 296, in init_host
2012-11-08 11:25:47 TRACE nova self.driver.plug_vifs(instance, legacy_net_info)
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 437, in plug_vifs
2012-11-08 11:25:47 TRACE nova self.vif_driver.plug(instance, (network, mapping))
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 111, in plug
2012-11-08 11:25:47 TRACE nova return self._get_configurations(instance, network, mapping)
2012-11-08 11:25:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 68, in _get_configurations
2012-11-08 11:25:47 TRACE nova conf.add_filter_param("IP", mapping['ips'][0]['ip'])
2012-11-08 11:25:47 TRACE nova IndexError: list index out of range
2012-11-08 11:25:47 TRACE nova
Terminating this failed instance and then restarting compute works as a fix.
Oh and this is ubuntu 12.04 running Folsom