OpenStack Compute (nova)

Bug #921858
Comment #1

Comment 1 for bug 921858

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2012-01-25: Re: [Bug 921858] [NEW] Sporadic 'Failed to get metadata for ip:'

Yep, we call deallocate_for_instance before calling driver.destroy (see below)

Perhaps this needs to be reversed, although we will have to do some testing to verify that doesn't break anything.

586 def _shutdown_instance(self, context, instance, action_str):
587 """Shutdown an instance on this host."""
588 context = context.elevated()
589 instance_id = instance['id']
590 instance_uuid = instance['uuid']
591 LOG.audit(_("%(action_str)s instance %(instance_uuid)s") %
592 {'action_str': action_str, 'instance_uuid': instance_uuid},
593 context=context)
594
595 network_info = self._get_instance_nw_info(context, instance)
596 if not FLAGS.stub_network:
597 self.network_api.deallocate_for_instance(context, instance)
598
599 if instance['power_state'] == power_state.SHUTOFF:
600 self.db.instance_destroy(context, instance_id)
601 raise exception.Error(_('trying to destroy already destroyed'
602 ' instance: %s') % instance_uuid)
603 # NOTE(vish) get bdms before destroying the instance
604 bdms = self._get_instance_volume_bdms(context, instance_id)
605 block_device_info = self._get_instance_volume_block_device_info(
606 context, instance_id)
607 self.driver.destroy(instance, network_info, block_device_info)

Vish

On Jan 25, 2012, at 2:26 PM, David Kranz wrote:

> Public bug reported:
>
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
>
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
> estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
> .7] text/plain text/plain
>
> ** Affects: nova
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/921858
>
> Title:
> Sporadic 'Failed to get metadata for ip:'
>
> Status in OpenStack Compute (Nova):
> New
>
> Bug description:
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
>
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
> estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
> .7] text/plain text/plain
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/921858/+subscriptions

Yep, we call deallocate_for_instance before calling driver.destroy (see below)

Perhaps this needs to be reversed, although we will have to do some testing to verify that doesn't break anything.

586     def _shutdown_instance(self, context, instance, action_str):
 587         """Shutdown an instance on this host."""
 588         context = context.elevated()
 589         instance_id = instance['id']
 590         instance_uuid = instance['uuid']
 591         LOG.audit(_("%(action_str)s instance %(instance_uuid)s") %
 592                   {'action_str': action_str, 'instance_uuid': instance_uuid},
 593                   context=context)
 594 
 595         network_info = self._get_instance_nw_info(context, instance)
 596         if not FLAGS.stub_network:
 597             self.network_api.deallocate_for_instance(context, instance)
 598 
 599         if instance['power_state'] == power_state.SHUTOFF:
 600             self.db.instance_destroy(context, instance_id)
 601             raise exception.Error(_('trying to destroy already destroyed'
 602                                     ' instance: %s') % instance_uuid)
 603         # NOTE(vish) get bdms before destroying the instance
 604         bdms = self._get_instance_volume_bdms(context, instance_id)
 605         block_device_info = self._get_instance_volume_block_device_info(
 606             context, instance_id)
 607         self.driver.destroy(instance, network_info, block_device_info)

Vish

On Jan 25, 2012, at 2:26 PM, David Kranz wrote:

> Public bug reported:
> 
> I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
> I am working on getting the stress tests checked into Tempest as soon as possible.
> 
> 2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
> estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
> lib/2.7] text/plain text/html
> 2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
> 0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
> 2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
> 2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
> .7] text/plain text/plain
> 
> ** Affects: nova
>     Importance: Undecided
>         Status: New
> 
> -- 
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/921858
> 
> Title:
>  Sporadic 'Failed to get metadata for ip:'
> 
> Status in OpenStack Compute (Nova):
>  New
> 
> Bug description:
>  I see this error in the nova-api.log (the one from the compute node) when running a stress test that starts/kills vms rapidly. This is from a diablo-stable cluster with one controller and two compute nodes, using multi-host with nova-network and nova-api running on each compute node. It happens maybe 20% of the runs. Is is possible there is a race condition involving tearing down a vm and removing its fixed ip from this database?
>  I am working on getting the stress tests checked into Tempest as soon as possible.
> 
>  2012-01-25 17:10:56,282 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'fixed_ip': '10.\
>  0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
>  2012-01-25 17:10:56,400 DEBUG nova.compute.api [58fba9aa-f844-4edb-84f4-4d5877450762 None None] Searching by: {'project_id': 't\
>  estproject'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
>  2012-01-25 17:10:56,553 INFO nova.api [-] 0.271808s 10.0.0.6 GET /2009-04-04/meta-data/local-hostname None:None 200 [Python-url\
>  lib/2.7] text/plain text/html
>  2012-01-25 17:10:56,556 DEBUG nova.compute.api [0fef399b-6d27-4e20-a745-badad830ac9c None None] Searching by: {'fixed_ip': '10.\
>  0.0.6'} from (pid=1093) get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:863
>  2012-01-25 17:10:56,634 ERROR nova.api.ec2.metadata [-] Failed to get metadata for ip: 10.0.0.6
>  2012-01-25 17:10:56,634 INFO nova.api [-] 0.78120s 10.0.0.6 GET /2009-04-04/meta-data/placement/ None:None 404 [Python-urllib/2\
>  .7] text/plain text/plain
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/921858/+subscriptions