Evacuate fails during rebuild of the VM on the target host with RPC timeout

Bug #1297642 reported by Sangeeta Singh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

when using the 'nova evacuate' to evacuate a VM with no shared storage to a target host the command fails during the rebuild step leaving the VM in the rebuilding state on the target host.

The VM is evacuated from the failed host but fails with RPC timeout error during the rebuild on the target host.

Here are steps to recreate the issue:

1) create a vm on a host
nova boot --flavor m1.small --image my_image test-vm

2) disable the compute host of the VM and stop the nova-compute process on it

3) nova evacuate test-vm target-host
     the VM is evacuated from the failed host and starts rebuilding on the target host
                                |

5) check test-vm

nova show test-vm

server error 500 with roc timeout and the VM is suck in the rebuilding state on the target host.

Tags: compute
Tracy Jones (tjones-i)
tags: added: compute
Revision history for this message
Sangeeta Singh (singhs) wrote :
Download full text (4.7 KiB)

The fault listed on the nova show is

                                          |
| fault | {u'message': u'Timeout while waiting on RPC response - topic: "network", RPC method: "setup_networks_on_host" info: "<unknown>"', u'code': 500, u'details': u' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 258, in decorated_function |
| | return function(self, context, *args, **kwargs) |
| | File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2037, in rebuild_instance |
| | context, instance, self.host) |
| | File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 93, in wrapped |
| | return func(self, context, *args, **kwargs) |
| | File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 508, in setup_networks_on_host |
| | self.network_rpcapi.setup_networks_on_host(context, **args) |
| | File "/usr/lib/python2.6/site-packages/nova/network/rpcapi.py", line 275, in setup_networks_on_host |
| | teardown=teardown) |
| | File "/usr/lib/python2.6/site-packages/nova/rpcclient.py", line 85, in cal...

Read more...

Revision history for this message
Tiago Mello (timello) wrote :

I was testing with python 2.7. I wasn't able to reproduce the problem. Let me try with 2.6

Changed in nova:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
Tiago Mello (timello) wrote :

Could you provide more info about the environment you are facing the problem? Does it happen with Python 2.7?

Revision history for this message
Sangeeta Singh (singhs) wrote :

NO, I have not tried with python 2.7.

This is with Havana stable codebase.

the komu and amqp versions on my system are

kombu==3.0.5
amqp==1.3.3
amqplib==0.6.1
nova==2013.2

What else do you want to know?

Revision history for this message
Sangeeta Singh (singhs) wrote :

One more data point, my setup uses nova network and not neutron.

Revision history for this message
Sangeeta Singh (singhs) wrote :

The issue is during the rebuild/recreate on the target host it makes a call to nova network to setup the network for the instance on the host.

This is a rpc.call and when the nova network receives this call since it is running in the multi host mode it does another roc.call on the reply queue of the first call.

Once the target host receives the response on the reply que it just consumes it but does not forward the call to the network manager for processing which nova network keeps waiting for the response and finally time-out.

This seems like a bug since it should foist complete the initial call and then create a new rpc call for the taget host to setup the network.

I am using the FlatManager class.

Revision history for this message
melanie witt (melwitt) wrote :

Closing this as Invalid as this appears to be an incorrect use of the multi host mode in nova-network. That is, the network has been configured as multi host but the deployment isn't running nova-network on every compute host.

The newest documentation (Grizzly) about the nova-network multi host feature I could find says:

"The multi_host option must be in place when you create the network and nova-network must be run on every compute host. These created multi hosts networks will send all network related commands to the host that the specific VM is on." [1]

[1] http://docs.openstack.org/grizzly/openstack-compute/admin/content/existing-ha-networking-options.html#d6e9503

Please re-open if needed.

Changed in nova:
status: New → Invalid
Revision history for this message
Sangeeta Singh (singhs) wrote :

Thanks. I was going to close it once I was sure and had tested completely.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.