Sporadic 'Failed to get metadata for ip:'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Medium
|
Unassigned |
Bug Description
I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
I am working on getting the stress tests checked into Tempest as soon as possible.
2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-
0.0.6'} from (pid=1093) get_all /usr/lib/
2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-
estproject'} from (pid=1093) get_all /usr/lib/
2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-
lib/2.7] text/plain text/html
2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-
0.0.6'} from (pid=1093) get_all /usr/lib/
2012-01-25 17:10:56,634 ERROR nova.api.
2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-
.7] text/plain text/plain
Changed in nova: | |
status: | Incomplete → Invalid |
Yep, we call deallocate_ for_instance before calling driver.destroy (see below)
Perhaps this needs to be reversed, although we will have to do some testing to verify that doesn't break anything.
586 def _shutdown_ instance( self, context, instance, action_str): _("%(action_ str)s instance %(instance_uuid)s") % instance_ nw_info( context, instance) api.deallocate_ for_instance( context, instance) 'power_ state'] == power_state. SHUTOFF: instance_ destroy( context, instance_id) Error(_ ('trying to destroy already destroyed' instance_ volume_ bdms(context, instance_id) instance_ volume_ block_device_ info( destroy( instance, network_info, block_device_info)
587 """Shutdown an instance on this host."""
588 context = context.elevated()
589 instance_id = instance['id']
590 instance_uuid = instance['uuid']
591 LOG.audit(
592 {'action_str': action_str, 'instance_uuid': instance_uuid},
593 context=context)
594
595 network_info = self._get_
596 if not FLAGS.stub_network:
597 self.network_
598
599 if instance[
600 self.db.
601 raise exception.
602 ' instance: %s') % instance_uuid)
603 # NOTE(vish) get bdms before destroying the instance
604 bdms = self._get_
605 block_device_info = self._get_
606 context, instance_id)
607 self.driver.
Vish
On Jan 25, 2012, at 2:26 PM, David Kranz wrote:
> Public bug reported: f844-4edb- 84f4-4d58774507 62 None None] Searching by: {'fixed_ip': '10.\ python2. 7/dist- packages/ nova/compute/ api.py: 863 f844-4edb- 84f4-4d58774507 62 None None] Searching by: {'project_id': 't\ python2. 7/dist- packages/ nova/compute/ api.py: 863 04/meta- data/local- hostname None:None 200 [Python-url\ 6d27-4e20- a745-badad830ac 9c None None] Searching by: {'fixed_ip': '10.\ python2. 7/dist- packages/ nova/compute/ api.py: 863 ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6 04/meta- data/placement/ None:None 404 [Python-urllib/2\
>
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
>
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-
> 0.0.6'} from (pid=1093) get_all /usr/lib/
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-
> estproject'} from (pid=1093) get_all /usr/lib/
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-
> 0.0.6'} from (pid=1093) get_all /usr/lib/
> 2012-01-25 17:10:56,634 ERROR nova.api.
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-
> .7] text/plain text/plain
>
> ** Affects: nova
> Importance:...