[SRU] trusty/icehouse neutron-plugin-openvswitch-agent: lvm.tun_ofports.remove crashes with KeyError

Bug #1531963 reported by JuanJo Ciarlante on 2016-01-07
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron (Ubuntu)
High
Corey Bryant
Trusty
High
Unassigned

Bug Description

[Impact]
Neutron OVS breaks with unhandled exceptions on compute nodes.

[Test Case]
For reproduction see the original bug description below.

[Regression Potential]

The backported patch is very straightforward, with a few minor conflicts noted in the patch.

-------------

Original bug description:

Filing this on ubuntu/neutron package, as neutron itself is EOL'd for Icehouse.

FYI this is a nonHA icehouse/trusty deploy using serverteam's juju charms.

On one of our production environments with a rather high rate of API calls, (sp for transient VMs from CI), we frequently get neutron OVS breakage on compute nodes¹, which we've been able to more or less correlate with the following alike errors at /var/log/neutron/openvswitch-agent.log:

2016-01-07 06:33:48.917 18357 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 399, in _del_fdb_flow
2016-01-07 06:33:48.917 18357 TRACE neutron.openstack.common.rpc.amqp lvm.tun_ofports.remove(ofport)
2016-01-07 06:33:48.917 18357 TRACE neutron.openstack.common.rpc.amqp KeyError: '13'

Detailed log: http://paste.ubuntu.com/14431656/ - note the same time of occurrence on the 3 diff compute nodes shown there.

¹ What we then observe are missing are missing tun_ids from
  ovs-ofctl dump-flows br-tun
ie provider:segmentation_id not present at the compute node for a VM with a neutron network that has it.

Afaics this had been fixed upstream at ( lp#1421105 ):
https://git.openstack.org/cgit/openstack/neutron/commit/?id=841b2f58f375df53b380cf5796bb31c82cd09260
, please consider backporting it to Icehouse, it's a pretty trivial fix.

JuanJo Ciarlante (jjo) on 2016-01-07
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
James Troup (elmo) on 2016-01-14
tags: added: openstack sts
Antonio Rosales (arosales) wrote :

Chatting with Anthony and Tom this hasn't been tested on Openstack > Icehouse but the fix (https://git.openstack.org/cgit/openstack/neutron/commit/?id=841b2f58f375df53b380cf5796bb31c82cd09260) is suggested to resolve the issue they are seeing.

Next action, triage https://git.openstack.org/cgit/openstack/neutron/commit/?id=841b2f58f375df53b380cf5796bb31c82cd09260 and confirm back-porting to Icehouse.

Changed in neutron (Ubuntu):
status: Confirmed → Triaged
Corey Bryant (corey.bryant) wrote :

Seems like a fairly straightforward backport for icehouse (assuming you don't need the ofagent bits).

Corey Bryant (corey.bryant) wrote :

Note, fix is already in juno so we're ready to go straight to trusty/icehouse.

Changed in neutron (Ubuntu):
status: Triaged → Fix Committed
assignee: nobody → Corey Bryant (corey.bryant)
importance: Undecided → High
summary: - trusty/icehouse neutron-plugin-openvswitch-agent: lvm.tun_ofports.remove
- crashes with KeyError
+ [SRU] trusty/icehouse neutron-plugin-openvswitch-agent:
+ lvm.tun_ofports.remove crashes with KeyError
description: updated
James Page (james-page) on 2016-02-11
Changed in neutron (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Changed in neutron (Ubuntu):
status: Fix Committed → Fix Released

Hello JuanJo, or anyone else affected,

Accepted neutron into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/1:2014.1.5-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in neutron (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
JuanJo Ciarlante (jjo) wrote :

Thanks Chris for the updates - FYI we've upgraded all of our compute nodes
1:2014.1.5-0ubuntu3 from proposed, no (extra)issues so far after some hours,
FYI this stack has ~30 nodes, ~1k+ active instances.

We expect this change to (obviously) stop those KeyError messages at log,
and likely also stop nodes from missing tun_ids - FYI we regularly get alerted
for the latter (~several times a week), I'll add an update next week on how
it went.

JuanJo Ciarlante (jjo) wrote :

Chris: confirming this bug most likely fixed indeed by
1:2014.1.5-0ubuntu3, as there has been no further alerts from missing
tun_ids since it got installed 1 week ago (recall we had been getting
several of those per week).

Thanks! :) --J

Corey Bryant (corey.bryant) wrote :

Thanks JuanJo. Regression tests for neutron have also passed successfully. Marking this as verification-done.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 1:2014.1.5-0ubuntu3

---------------
neutron (1:2014.1.5-0ubuntu3) trusty; urgency=medium

  [ Corey Bryant ]
  * d/p/make_del_fdb_flow_idempotent.patch: Cherry pick from Juno
    to prevent KeyError on duplicate port removal in del_fdb_flow()
    (LP: #1531963).
  * d/tests/*-plugin: Fix race between service restart and pidof test.

  [ James Page ]
  * d/p/ovs-restart.patch: Ensure that tunnels are fully reset on ovs
    restart (LP: #1460164).

 -- Corey Bryant <email address hidden> Wed, 10 Feb 2016 14:52:04 -0500

Changed in neutron (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers