Sporadic 'Failed to get metadata for ip:'

Bug #921858 reported by David Kranz
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Unassigned

Bug Description

I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
I am working on getting the stress tests checked into Tempest as soon as possible.

2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
lib/2.7] text/plain text/html
2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
.7] text/plain text/plain

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 921858] [NEW] Sporadic 'Failed to get metadata for ip:'
Download full text (5.1 KiB)

Yep, we call deallocate_for_instance before calling driver.destroy (see below)

Perhaps this needs to be reversed, although we will have to do some testing to verify that doesn't break anything.

 586 def _shutdown_instance(self, context, instance, action_str):
 587 """Shutdown an instance on this host."""
 588 context = context.elevated()
 589 instance_id = instance['id']
 590 instance_uuid = instance['uuid']
 591 LOG.audit(_("%(action_str)s instance %(instance_uuid)s") %
 592 {'action_str': action_str, 'instance_uuid': instance_uuid},
 593 context=context)
 594
 595 network_info = self._get_instance_nw_info(context, instance)
 596 if not FLAGS.stub_network:
 597 self.network_api.deallocate_for_instance(context, instance)
 598
 599 if instance['power_state'] == power_state.SHUTOFF:
 600 self.db.instance_destroy(context, instance_id)
 601 raise exception.Error(_('trying to destroy already destroyed'
 602 ' instance: %s') % instance_uuid)
 603 # NOTE(vish) get bdms before destroying the instance
 604 bdms = self._get_instance_volume_bdms(context, instance_id)
 605 block_device_info = self._get_instance_volume_block_device_info(
 606 context, instance_id)
 607 self.driver.destroy(instance, network_info, block_device_info)

Vish

On Jan 25, 2012, at 2:26 PM, David Kranz wrote:

> Public bug reported:
>
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
>
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
> estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
> .7] text/plain text/plain
>
> ** Affects: nova
> Importance:...

Read more...

Revision history for this message
Brian Waldon (bcwaldon) wrote :

This could be the cause of the other '404 on list after instance delete' bug (that I can't find right now).

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

This bug looks super old but I am seeing this in the check queue right now in the nova metadata api service:

http://logs.openstack.org/63/84363/2/check/check-tempest-dsvm-postgres-full/eb1d11d/logs/screen-n-api-meta.txt.gz

http://goo.gl/eBmE7Z

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note that when this shows up in the check queue, it's still 95% success rate in the job, the tests are passing, it's just dirty logs. So I guess we should clean the log in Juno:

https://github.com/openstack/nova-specs/blob/master/specs/juno/clean-logs.rst

tags: added: api network
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Dolph Mathews (dolph) wrote :

Searching 143 hits for the query above, all failures seem to be in postgres jobs. stable/icehouse also appears to be affected.

build_name
  81% check-tempest-dsvm-postgres-full
  17% gate-tempest-dsvm-postgres-full
  0% check-tempest-dsvm-postgres-full-icehouse

tags: added: icehouse-backport-potential
Revision history for this message
Joe Gordon (jogo) wrote :

No hits in elastic-recheck any more, looks like this was resolved

Changed in nova:
status: Confirmed → Incomplete
Sean Dague (sdague)
Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.