neutron_dynamic_routing Bgp floatingip_update KeyError: 'last_known_router_id'

Bug #1795816 reported by Pawel Dudczak
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Benoît Knecht

Bug Description

Hi, every time when i run the tempest test:
tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops

a python exception appears in the neutron server.log:

2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager [req-3fe9d833-84d2-41ed-bae3-90f02b1425f4 c3f8dd04c65b44adbda2389ef5aa8f87 38908f51f8c740b18e71fc62352076fb - default default] Error during notification for neutron_dynamic_routing.services.bgp.bgp_plugin.BgpPlugin.floatingip_update_callback-13778 floatingip, after_update: KeyError: 'last_known_router_id'
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager Traceback (most recent call last):
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager File "/usr/lib/python2.7/site-packages/neutron_lib/callbacks/manager.py", line 177, in _notify_loop
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, **kwargs)
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/bgp_plugin.py", line 236, in floatingip_update_callback
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager last_router_id = kwargs['last_known_router_id']
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager KeyError: 'last_known_router_id'
2018-10-02 16:13:01.380 11754 ERROR neutron_lib.callbacks.manager

it seems to replace the line:
233 last_router_id = kwargs['last_known_router_id']

in /usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/bgp_plugin.py with line last_router_id = kwargs.get('last_known_router_id') solves the problem.

This problem is found in Pike and Queens releases of OpenStack.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Do You have link to any CI job results with such error?

Revision history for this message
Pawel Dudczak (pdudczak) wrote :

Unfortunately no, I don't run CI jobs. Tempest is generating this problem in production environment.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Can You provide exact versions of: neutron, neutron-dynamic-routing and neutron-lib used when this issue happens to You?

Revision history for this message
Pawel Dudczak (pdudczak) wrote :

python-neutron-12.0.3-1
openstack-neutron-bgp-dragent-12.0.1-1
openstack-neutron-dynamic-routing-common-12.0.1-1
python2-neutron-lib-1.13.0-1

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Ryan Tidwell (ryan-tidwell) wrote :

I think we can definitely be more defensive about this sort of thing. This is causing me to want to look at little deeper at the notifications neutron-dynamic-routing consumes. Is this blocking you? I assume new and updated FIP announcements would end up delayed because of this, but within 30 seconds the new route should be announced. Are you seeing any other behavior that is problematic, or is this just noise in the logs for you?

Changed in neutron:
assignee: nobody → Ryan Tidwell (ryan-tidwell)
Revision history for this message
Ryan Tidwell (ryan-tidwell) wrote :

I'm unable to reproduce this behavior on master. This would be encountered whenever a new floating IP that has no previous router association is associated, disassociated, or migrated. I am surprised that last_known_router_id isn't included in the notification payload https://github.com/openstack/neutron/blame/f88cb5276dc8d31b484772f31160fc1071a28541/neutron/db/l3_db.py#L1294. I'll keep digging and go back to something older than master.

Revision history for this message
Ryan Tidwell (ryan-tidwell) wrote :

I spoke too soon, I was able to reproduce this on master. This occurs when a VM with an associated FIP is destroyed before the FIP is disassociated. The FIP is disassociated here [1], which results in a following notification payload that does not contain last_known_router_id [2]. In practice this shouldn't affect BGP announcements, but it isn't very clean. This can be fixed as described in the report by simply being more defensive in the plugin, or the notification from the L3 plugin could use a more well-defined payload.

[1] https://github.com/openstack/neutron/blob/8c05cd31effb00a508b0052a9dcedc5fa6084ba8/neutron/plugins/ml2/plugin.py#L1606
[2] https://github.com/openstack/neutron/blob/f88cb5276dc8d31b484772f31160fc1071a28541/neutron/db/l3_db.py#L1619

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-dynamic-routing (master)

Fix proposed to branch: master
Review: https://review.opendev.org/665328

Changed in neutron:
assignee: Ryan Tidwell (ryan-tidwell) → Benoît Knecht (benoit-knecht)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-dynamic-routing (master)

Reviewed: https://review.opendev.org/665328
Committed: https://git.openstack.org/cgit/openstack/neutron-dynamic-routing/commit/?id=4780fe548b86eba7f64a57ccf2d000958c238253
Submitter: Zuul
Branch: master

commit 4780fe548b86eba7f64a57ccf2d000958c238253
Author: Benoît Knecht <email address hidden>
Date: Fri Jun 14 10:12:18 2019 +0200

    bgp: Gracefully handle missing last_known_router_id

    When a server is deleted before its floating IP has been disassociated,
    the notification doens't contain a `last_known_router_id` key, which
    results in a `KeyError` exception being thrown.

    This commit gracefully handles this situation by setting
    `last_router_id` to `None` when `last_known_router_id` is missing.

    Change-Id: If127a33cec7ce6c4d264a191df37c30decab4daa
    Closes-Bug: #1795816

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron-dynamic-routing 16.0.0.0b1

This issue was fixed in the openstack/neutron-dynamic-routing 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-dynamic-routing (stable/train)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-dynamic-routing (stable/train)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/841270
Reason: CI is broken and this stable version is going to be considered EOM.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.