Bug #921858 “Sporadic 'Failed to get metadata for ip:'” : Bugs : OpenStack Compute (nova)

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2012-01-25: Re: [Bug 921858] [NEW] Sporadic 'Failed to get metadata for ip:'

#1

Download full text (5.1 KiB)

Yep, we call deallocate_for_instance before calling driver.destroy (see below)

Perhaps this needs to be reversed, although we will have to do some testing to verify that doesn't break anything.

586 def _shutdown_instance(self, context, instance, action_str):
587 """Shutdown an instance on this host."""
588 context = context.elevated()
589 instance_id = instance['id']
590 instance_uuid = instance['uuid']
591 LOG.audit(_("%(action_str)s instance %(instance_uuid)s") %
592 {'action_str': action_str, 'instance_uuid': instance_uuid},
593 context=context)
594
595 network_info = self._get_instance_nw_info(context, instance)
596 if not FLAGS.stub_network:
597 self.network_api.deallocate_for_instance(context, instance)
598
599 if instance['power_state'] == power_state.SHUTOFF:
600 self.db.instance_destroy(context, instance_id)
601 raise exception.Error(_('trying to destroy already destroyed'
602 ' instance: %s') % instance_uuid)
603 # NOTE(vish) get bdms before destroying the instance
604 bdms = self._get_instance_volume_bdms(context, instance_id)
605 block_device_info = self._get_instance_volume_block_device_info(
606 context, instance_id)
607 self.driver.destroy(instance, network_info, block_device_info)

Vish

On Jan 25, 2012, at 2:26 PM, David Kranz wrote:

> Public bug reported:
>
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
>
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
> estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
> .7] text/plain text/plain
>
> ** Affects: nova
> Importance:...

Yep, we call deallocate_for_instance before calling driver.destroy (see below)

Perhaps this needs to be reversed, although we will have to do some testing to verify that doesn't break anything.

586     def _shutdown_instance(self, context, instance, action_str):
 587         """Shutdown an instance on this host."""
 588         context = context.elevated()
 589         instance_id = instance['id']
 590         instance_uuid = instance['uuid']
 591         LOG.audit(_("%(action_str)s instance %(instance_uuid)s") %
 592                   {'action_str': action_str, 'instance_uuid': instance_uuid},
 593                   context=context)
 594 
 595         network_info = self._get_instance_nw_info(context, instance)
 596         if not FLAGS.stub_network:
 597             self.network_api.deallocate_for_instance(context, instance)
 598 
 599         if instance['power_state'] == power_state.SHUTOFF:
 600             self.db.instance_destroy(context, instance_id)
 601             raise exception.Error(_('trying to destroy already destroyed'
 602                                     ' instance: %s') % instance_uuid)
 603         # NOTE(vish) get bdms before destroying the instance
 604         bdms = self._get_instance_volume_bdms(context, instance_id)
 605         block_device_info = self._get_instance_volume_block_device_info(
 606             context, instance_id)
 607         self.driver.destroy(instance, network_info, block_device_info)

Vish

On Jan 25, 2012, at 2:26 PM, David Kranz wrote:

> Public bug reported:
> 
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
> 
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
> estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
> .7] text/plain text/plain
> 
> ** Affects: nova
>     Importance: Undecided
>         Status: New
> 
> -- 
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/921858
> 
> Title:
>  Sporadic 'Failed to get metadata for ip:'
> 
> Status in OpenStack Compute (Nova):
>  New
> 
> Bug description:
>  I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
>  I am working on getting the stress tests checked into Tempest as soon as possible.
> 
>  2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
>  0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
>  2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
>  estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
>  2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
>  lib/2.7] text/plain text/html
>  2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
>  0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
>  2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
>  2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
>  .7] text/plain text/plain
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/921858/+subscriptions

Revision history for this message

Brian Waldon (bcwaldon) wrote on 2012-01-27:

#2

This could be the cause of the other '404 on list after instance delete' bug (that I can't find right now).

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-04-21:

#3

This bug looks super old but I am seeing this in the check queue right now in the nova metadata api service:

http://logs.openstack.org/63/84363/2/check/check-tempest-dsvm-postgres-full/eb1d11d/logs/screen-n-api-meta.txt.gz

http://goo.gl/eBmE7Z

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-04-21:

#4

Note that when this shows up in the check queue, it's still 95% success rate in the job, the tests are passing, it's just dirty logs. So I guess we should clean the log in Juno:

https://github.com/openstack/nova-specs/blob/master/specs/juno/clean-logs.rst

tags:

added: api network

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-04-22:

#5

elastic-recheck query: https://review.openstack.org/#/c/89572/

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-06-19:

#6

Searching 143 hits for the query above, all failures seem to be in postgres jobs. stable/icehouse also appears to be affected.

build_name
  81% check-tempest-dsvm-postgres-full
  17% gate-tempest-dsvm-postgres-full
  0% check-tempest-dsvm-postgres-full-icehouse

tags:

added: icehouse-backport-potential

Revision history for this message

Joe Gordon (jogo) wrote on 2014-08-29:

#7

No hits in elastic-recheck any more, looks like this was resolved

Changed in nova:
status:	Confirmed → Incomplete

Sean Dague (sdague) on 2014-09-16

Changed in nova:
status:	Incomplete → Invalid

OpenStack Compute (nova)

Sporadic 'Failed to get metadata for ip:'

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches