Losing network info cache sometimes

Bug #1323475 reported by Tiantian Gao
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Medium
Tiantian Gao

Bug Description

We are using stable/havana.

For some inexplicable reason, some instances lost network information. The result looks like:

$ nova list
| a8f8a437-d203-4265-aca2-7bd35539c5d1 | test | ACTIVE | - | Running |

$ neutron port-list --device-id a8f8a437-d203-4265-aca2-7bd35539c5d1
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| 6b042778-76bb-45ca-86a8-abfdb1ba1a62 | | fa:16:3e:67:9a:88 | {"subnet_id": "90b338d3-7711-48fd-a0f6-11a27388cb42", "ip_address": "10.162.82.2"} |
| 9800fd03-5e07-4a54-8568-28d501073c5f | | fa:16:3e:d0:86:4a | {"subnet_id": "9a1fc59d-aec1-4e3a-bd88-99ea558e8b29", "ip_address": "192.168.0.5"} |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+

neutron said there are two ports binding with the instance, but nova said the instance has no port.

We dug logs, and found somethings went wrong after running heal_instance_info_cache. One line log said the instance info_cache is [], but the previous log said the instance info_cache is filled. From that time, the info_cache lost, and can't self-healing.

The simple logs pasted below, and full log here: http://paste.openstack.org/show/81605/

....
2014-05-26 03:47:13.258 14884 DEBUG nova.network.api [-] Updating cache with info: [VIF({'ovs_interfaceid': u'5953e098-e131-48eb-b53c-5eb095f3bfee', 'network': Network({'bridge': 'br-int', 'subne
ts': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'10.162.81.4'})], 'version': 4, 'meta': {'dhcp_server': u'10.162.81.3'}, 'dns': [], 'rout
es': [], 'cidr': u'10.162.81.0/28', 'gateway': IP({'meta': {}, 'version': None, 'type': 'gateway', 'address': None})})], 'meta': {'injected': False, 'tenant_id': u'c10373fb5d234e31af4d5d56527994f
c'}, 'id': u'b0bb08c1-dc05-4e17-a021-f3b850a823ba', 'label': u'idc_c10373fb5d234e31af4d5d56527994fc'}), 'devname': u'tap5953e098-e1', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:40:34:4
c', 'type': u'ovs', 'id': u'5953e098-e131-48eb-b53c-5eb095f3bfee', 'qbg_params': None})] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:71
2014-05-26 03:47:13.263 14884 DEBUG nova.compute.manager [-] [instance: 49a806a9-986e-4ce3-ae9f-d3c4317255a3] Updated the info_cache for instance _heal_instance_info_cache /usr/lib/python2.7/dist
-packages/nova/compute/manager.py:5146
.....
2014-05-26 03:52:14.255 14884 DEBUG nova.network.api [-] Updating cache with info: [] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:71
.....

I try hard but can't no re-product the bug manual, The key problem here is why the info_cache not showing up. But on the other hand, we'd better give nova the ability to self-healing in this case.

Tracy Jones (tjones-i)
tags: added: network
Sean Dague (sdague)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Changed in nova:
assignee: nobody → Tiantian Gao (gtt116)
status: Confirmed → In Progress
Revision history for this message
egon (egon-p) wrote :

Same behavior seen in grizzly as well.

egon (egon-p)
summary: - Losting network info_cache sometimes
+ Losing network info cache sometimes
Revision history for this message
egon (egon-p) wrote :

In this state, there aren't any actions including changing ports, deleting interfaces, rebooting, etc which cause the info_cache to be rebuilt.

Revision history for this message
egon (egon-p) wrote :

Inserting a record, or updating network_info to have at least [{"network": {"id": "NETWORK_ID"}}] allows the auto_heal process to work.

Revision history for this message
egon (egon-p) wrote :
Brent Eagles (beagles)
tags: added: neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Tiantian Gao (<email address hidden>) on branch: master
Review: https://review.openstack.org/98068
Reason: Havana is too old, so abandon

Revision history for this message
Joe Gordon (jogo) wrote :

Havana is not supported anymore

Changed in nova:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.