InstanceInfoCache is not always updated with concurrent instance creation

Bug #1227143 reported by Jordan Pittier
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Hi,
I try to launch 45 instances in parallel :
nova boot --image ${image_id} --flavor ${flavor_id} --nic net-id=${net_id} --num-instances 45 jordan-internet

Sometimes, nova fails to get the IP address of one or 2 instances :
>nova list
+--------------------------------------+------------------------------------------------------+--------+------------+-------------+------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------------------------------------------------------+--------+------------+-------------+------------------------+
| 31d9c78b-1d03-46a8-98eb-47143679e1c8 | jordan-internet-31d9c78b-1d03-46a8-98eb-47143679e1c8 | ACTIVE | None | Running | internet=194.2.202.177 |
| 3301a06c-60c0-48a7-97b1-4ee292ccf294 | jordan-internet-3301a06c-60c0-48a7-97b1-4ee292ccf294 | ACTIVE | None | Running | internet=194.2.202.193 |
| 357b3366-4918-4d09-8c52-4ce5a1c02156 | jordan-internet-357b3366-4918-4d09-8c52-4ce5a1c02156 | ACTIVE | None | Running | |
| 35af1894-cb5d-4797-a0aa-88df32436351 | jordan-internet-35af1894-cb5d-4797-a0aa-88df32436351 | ACTIVE | None | Running | internet=194.2.202.171 |
| 2a5afa62-68c1-4343-ba40-7fb0021d2a23 | jordan-internet-2a5afa62-68c1-4343-ba40-7fb0021d2a23 | ACTIVE | None | Running | internet=194.2.202.153 |
| 30d14347-f260-4f6f-a55a-d2e5549f84e1 | jordan-internet-30d14347-f260-4f6f-a55a-d2e5549f84e1 | ACTIVE | None | Running | internet=194.2.202.188 |
| 318a9c90-3573-4ba2-a73a-d65a3d79a2b2 | jordan-internet-318a9c90-3573-4ba2-a73a-d65a3d79a2b2 | ACTIVE | None | Running | internet=194.2.202.176 |

>nova show 357b3366-4918-4d09-8c52-4ce5a1c02156
+-------------------------------------+-------------------------------------------------------------------+
| Property | Value |
+-------------------------------------+-------------------------------------------------------------------+
| status | ACTIVE |
| updated | 2013-09-18T13:21:49Z |
| OS-EXT-STS:task_state | None |
| OS-EXT-SRV-ATTR:host | d-ocnclc-0001 |
| key_name | jordan |
| image | Ubuntu precise 12.04.2 LTS (cecc8e4c-e689-40f9-80de-9a43219c9c0b) |
| hostId | bf54cba4f3fdc7b0b8c1b1872de977c770fb332c2d395e7acd370a4b |
| OS-EXT-STS:vm_state | active |
| OS-EXT-SRV-ATTR:instance_name | instance-00004ab6 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | d-ocnclc-0001.adm.dev1.val.cw-labs.net |
| flavor | m1.small (6) |
| id | 357b3366-4918-4d09-8c52-4ce5a1c02156 |
| security_groups | [{u'name': u'default'}] |
| user_id | bbdcffe9d3944c35bcf6a875b6d3d235 |
| name | jordan-internet-357b3366-4918-4d09-8c52-4ce5a1c02156 |
| created | 2013-09-18T13:18:22Z |
| tenant_id | 462d7cbd479f499fa75ad9f14571d553 |
| OS-DCF:diskConfig | MANUAL |
| metadata | {} |
| accessIPv4 | |
| accessIPv6 | |
| progress | 0 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-AZ:availability_zone | alpha1.val |
| config_drive | |
+-------------------------------------+-------------------------------------------------------------------+

What's weird is that nova interfast-list is "correct" :
jordan@jpi-octo:~$ nova interface-list 357b3366-4918-4d09-8c52-4ce5a1c02156
+------------+--------------------------------------+--------------------------------------+---------------+-------------------+
| Port State | Port ID | Net ID | IP addresses | MAC Addr |
+------------+--------------------------------------+--------------------------------------+---------------+-------------------+
| ACTIVE | d5b85cae-623e-4e77-a09a-3c057af6b4c7 | 7b07d545-53da-4897-abbd-dfe142db0fc2 | 194.2.202.181 | fa:16:3e:2f:9b:51 |
+------------+--------------------------------------+--------------------------------------+---------------+-------------------+

And quantum port-show is also 'correct' :
jordan@jpi-octo:~$ quantum port-show d5b85cae-623e-4e77-a09a-3c057af6b4c7
+----------------------+--------------------------------------------------------------------------------------+
| Field | Value |
+----------------------+--------------------------------------------------------------------------------------+
| admin_state_up | True |
| binding:capabilities | {"port_filter": true} |
| binding:vif_type | ovs |
| device_id | 357b3366-4918-4d09-8c52-4ce5a1c02156 |
| device_owner | compute:None |
| fixed_ips | {"subnet_id": "32ce548d-dd26-48f9-b7a0-78ea69f4750c", "ip_address": "194.2.202.181"} |
| id | d5b85cae-623e-4e77-a09a-3c057af6b4c7 |
| mac_address | fa:16:3e:2f:9b:51 |
| name | |
| network_id | 7b07d545-53da-4897-abbd-dfe142db0fc2 |
| security_groups | 865ca78f-d712-4828-8ec8-b53f7c3f5523 |
| status | ACTIVE |
| tenant_id | 462d7cbd479f499fa75ad9f14571d553 |
+----------------------+--------------------------------------------------------------------------------------+

So the problem lies in the SQL column 'network_info' of the table nova.instance_info. For a problematic instance, this column remains set as '[]' whereas for a 'correct' instance, the column contains some useful JSON data related to network

I cannot reproduce this issue when I launch let's say 5 instances in parallel or if I launch 45 instances one after another.

Any help would be very much appreciated. I don't fully understand what does compute/utils.py::get_nw_info_for_instance() nor network/quantumv2/api.py::_build_network_info_model() and network/quantumv2/api.py::allocate_for_instance() :(

Thanks,
Jordan

Setup :
Grizzly 2013.1.3
Nova-api with osapi_compute_workers=10
Quantum with openvswitch-plugin

tags: added: network
Revision history for this message
Michael H Wilson (geekinutah) wrote :

All nova does to generate the initial blob of json with "good" network info in it is call out to the Neutron API. It does a few different calls and if one of those fails the whole process fails. Two additional pieces of information would be helpful:

1. At the time of creation nova-compute logs from the host the instance launched on
2. Logs from the neutron server at the same time period

Changed in nova:
status: New → Incomplete
Revision history for this message
kexiaodong (kexiaodong) wrote :

I think this bug is the same as https://bugs.launchpad.net/nova/+bug/1254320

Magesh GV (magesh-gv)
Changed in nova:
status: Incomplete → Confirmed
Brent Eagles (beagles)
tags: added: neutron
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.