Comment 18 for bug 856764

Revision history for this message
Chris Friesen (cbf123) wrote :

Any update on this issue? I've just run into an issue that I think might be related. We have active/standby controllers (using pacemaker) and multiple compute nodes.

If a controller is killed uncleanly all the services come up on the other controller but it takes about 9 minutes or so before I can boot up a new instance. After that time I see "nova.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed" on the compute nodes, then it reconnects to the AMQP server and I can then boot an instance.

Unfortunately, any instances I tried to boot during those 9 minutes stay in the "BUILD/scheduling" state forever.