KeyError in L3 agent when running in dvr prevents correct setup of namespaces

Bug #1369012 reported by Mike Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Armando Migliaccio

Bug Description

There has been a regression in functionality recently. When a FIP namespace is scheduled to an L3 agent (either dvr or dvr_snat), the gw_port_host should contain the host binding where the FIP namespace should be hosted. The gw_port_host field is currently missing when the router info is sent to a dvr node with a ex_gw_port.

This is a regression caused by:

https://review.openstack.org/#/c/64553/

These changes:

- https://review.openstack.org/#/c/118707/
- https://review.openstack.org/#/c/114410/

Are also needed to be reinstated the rectify the behavior of DVR

tags: added: l3-dvr-backlog
Changed in neutron:
assignee: nobody → Mike Smith (michael-smith6)
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Changed in neutron:
status: New → Confirmed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

f80ed8a514ea22a9d53032c7b3b6e7708cc39ec2 is the first bad commit, since this is change:

https://review.openstack.org/#/c/118706/

And this was a break-up of a more complex patch, I wonder if this a temporary regression, ie. an effect that not all the fixes have landed.

I'll let Carl chime in, as he would have a much clearer picture.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Hi Folks, I did more investigation on this problem.

The reason we are not seeing the "gw_port_host" populated is because the "_build_routers_list" defined in "l3_dvr_db.py" is not called.

Only this function the binding is verified and the host is added.
Some where in the recent refactor it had affected the flow.

Since the router info does not have this information, the agent crashes.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Swami,

I think the issue is twofold.

- the chain for the snat fixes is incomplete
- the hierarchy has been broken by change https://review.openstack.org/#/c/64553/

If I do the following:

- pull from master
- Pull the remaining snat fixes (https://review.openstack.org/#/c/118707/ and https://review.openstack.org/#/c/114410/)
- Revert the HA patch

Everything goes back to normal.

So we need to figure out why we're observing stacktrace KeyError: 'gw_port_host', as outlined here:

http://logs.openstack.org/07/118707/3/experimental/check-tempest-dsvm-neutron-dvr/5023568/logs/screen-q-vpn.txt.gz?level=TRACE

A.

summary: - FIP namespace not created for dvr
+ KeyError in L3 agent when running in dvr prevents correct setup of
+ namespaces
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/121290

Changed in neutron:
assignee: Mike Smith (michael-smith6) → Armando Migliaccio (armando-migliaccio)
status: Confirmed → In Progress
Kyle Mestery (mestery)
Changed in neutron:
milestone: none → juno-rc1
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/121290
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4da6c130ea08f464ffccc769e7b97992469c103a
Submitter: Jenkins
Branch: master

commit 4da6c130ea08f464ffccc769e7b97992469c103a
Author: armando-migliaccio <email address hidden>
Date: Fri Sep 12 23:25:03 2014 -0700

    Fix KeyError on missing gw_port_host for L3 agent in DVR mode

    The order of Mixin imports broke the MRO, which caused some methods
    in the L3 hierarchy to be ignored. In particular, _build_routers_list
    for DVR was no longer called, which led to the stacktrace observed on
    the L3 agent side.

    Closes-bug: 1369012

    Change-Id: I23cd9813fb9b67e9029d222d5f72733ecec3febb

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.