Comment 6 for bug 1357055

Revision history for this message
Alex Xu (xuhj) wrote :

At compute manager, Instance 6ce21bd9-e6a6-41a6-81ec-72ce368cb38e is dispeared when building:

2014-09-20 16:56:09.439 DEBUG nova.compute.manager [req-d9ef4ec9-3904-4c19-8e38-f06c76a5f1ee InstanceRunTest-1019975326 InstanceRunTest-564222607] [instance: 6ce21bd9-e6a6-41a6-81ec-72ce368cb38e] Instance disappeared during build. do_build_and_run_instance /opt/stack/new/nova/nova/compute/manager.py:2033

Check the log in nova api, the instance is deleted at nova-api side, it because it deleted before compute manager assign the host to the intsance.

2014-09-20 16:56:08.963 INFO nova.ec2.wsgi.server [req-d9ef4ec9-3904-4c19-8e38-f06c76a5f1ee InstanceRunTest-1019975326 InstanceRunTest-564222607] 127.0.0.1 "POST /services/Cloud/ HTTP/1.1" status: 200 len: 1207 time: 2.3221920
2014-09-20 16:56:09.022 DEBUG nova.api.ec2 [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] action: TerminateInstances __call__ /opt/stack/new/nova/nova/api/ec2/__init__.py:325
2014-09-20 16:56:09.023 DEBUG nova.api.ec2 [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] arg: InstanceId.1 val: i-000000bd __call__ /opt/stack/new/nova/nova/api/ec2/__init__.py:328
2014-09-20 16:56:09.213 DEBUG nova.api.ec2.cloud [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] Going to start terminating instances terminate_instances /opt/stack/new/nova/nova/api/ec2/cloud.py:1462
2014-09-20 16:56:09.214 DEBUG nova.compute.api [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] [instance: 6ce21bd9-e6a6-41a6-81ec-72ce368cb38e] Going to try to terminate instance delete /opt/stack/new/nova/nova/compute/api.py:1791
2014-09-20 16:56:09.304 DEBUG nova.quota [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] Created reservations ['fdae1ad5-be89-4dd5-b54e-29f4d24c19dc', '86b41f79-7b8a-446d-b545-77719c3dd891', 'aef891e1-a066-4e2a-9978-27967759b0c2'] reserve /opt/stack/new/nova/nova/quota.py:1315
2014-09-20 16:56:09.393 DEBUG nova.quota [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] Committed reservations [u'fdae1ad5-be89-4dd5-b54e-29f4d24c19dc', u'86b41f79-7b8a-446d-b545-77719c3dd891', u'aef891e1-a066-4e2a-9978-27967759b0c2'] commit /opt/stack/new/nova/nova/quota.py:1340
2014-09-20 16:56:09.432 DEBUG nova.api.ec2.apirequest [req-4093d683-e99d-4816-b427-dd95f7c64987 InstanceRunTest-1019975326 InstanceRunTest-564222607] <TerminateInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2014-05-01/">

So compute manager try to cleanup the network: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2044

But there is no port found by the cleanup function:
2014-09-20 16:56:09.459 DEBUG neutronclient.client [req-d9ef4ec9-3904-4c19-8e38-f06c76a5f1ee InstanceRunTest-1019975326 InstanceRunTest-564222607] RESP:200 CaseInsensitiveDict({'date': 'Sat, 20 Sep 2014 16:56:09 GMT', 'content-length': '13', 'content-type': 'application/json; charset=UTF-8', 'x-openstack-request-id': 'req-e677c6b1-9316-4680-abe5-c3896bd71ee9'}) {"ports": []}

At same time the '_allocate_network_async' is running also.
After the cleanup function end, the '_allocate_network_async' create new port for instance 6ce21bd9-e6a6-41a6-81ec-72ce368cb38e

2014-09-20 16:56:09.865 1894 DEBUG nova.network.neutronv2.api [-] [instance: 6ce21bd9-e6a6-41a6-81ec-72ce368cb38e] Successfully created port: 89ff93ba-b020-43c7-98b3-7ba9afebcd1f _create_port /opt/stack/new/nova/nova/network/neutronv2/api.py:215

And check the log, there isn't any api call to neutron to delete this port 89ff93ba-b020-43c7-98b3-7ba9afebcd1f

And finally, the '_allocate_network_async' is stopped for instance not found also:
http://logs.openstack.org/85/122785/1/check/check-tempest-dsvm-neutron-pg-full/61429eb/logs/screen-n-cpu.txt.gz#_2014-09-20_16_56_10_014

So the instance is deleted before the 'yield resources' in '_build_resources'
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2216

before 'yield resources', the exception catching for InstanceNotFound won't waiting for the network_info sync finish.

So the race is happened between https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2044 and '_allocate_network_async'