KeyError: 'gw_port_host' seen for DVR router removal

Bug #1394043 reported by Mike Smith
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Stephen Ma
Juno
Fix Released
Undecided
Unassigned

Bug Description

In some multi-node setups, a qrouter namespace might be hosted on a node where only a dhcp port is hosted (no VMs, no SNAT).

When the router is removed from the db, the host with only the qrouter and dhcp namespace will have the qrouter namespace remain. Other hosts with the same qrouter will remove the namespace. The following KeyError is seen on the host with the remaining namespace -

2014-11-18 17:18:43.334 ERROR neutron.agent.l3_agent [-] 'gw_port_host'
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent Traceback (most recent call last):
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/common/utils.py", line 341, in call
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent return func(*args, **kwargs)
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 958, in process_router
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent self.external_gateway_removed(ri, ri.ex_gw_port, interface_name)
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1429, in external_gateway_removed
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent ri.router['gw_port_host'] == self.host):
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent KeyError: 'gw_port_host'
2014-11-18 17:18:43.334 TRACE neutron.agent.l3_agent
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl
    func(*args, **kwargs)
  File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1842, in _process_router_update
    self._process_router_if_compatible(router)
  File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1817, in _process_router_if_compatible
    self.process_router(ri)
  File "/opt/stack/neutron/neutron/common/utils.py", line 344, in call
    self.logger(e)
  File "/opt/stack/neutron/neutron/openstack/common/excutils.py", line 82, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/neutron/neutron/common/utils.py", line 341, in call
    return func(*args, **kwargs)
  File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 958, in process_router
    self.external_gateway_removed(ri, ri.ex_gw_port, interface_name)
  File "/opt/stack/neutron/neutron/agent/l3_agent.py", line 1429, in external_gateway_removed
    ri.router['gw_port_host'] == self.host):
KeyError: 'gw_port_host'

For the issue to be seen, the router in question needs to have the router-gateway-set previously.

Changed in neutron:
assignee: nobody → Mike Smith (michael-smith6)
tags: added: l3-dvr-backlog
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
tags: added: l3-ipam-dhcp
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/138562

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/138562
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5908a60ab98886a807044da56f51bceafd15edc9
Submitter: Jenkins
Branch: master

commit 5908a60ab98886a807044da56f51bceafd15edc9
Author: Michael Smith <email address hidden>
Date: Tue Dec 2 14:22:04 2014 -0800

    Fix for KeyError: 'gw_port_host' on l3_agent

    The dictionary field 'gw_port_host' was added for
    DVR routers and is used by the scheduler and l3_agent
    to schedule where the SNAT port for a DVR router
    will be hosted. In some code flows on the l3_agent,
    this field is checked to determine what the agent
    should do if the host matches its own or not.

    Recently it has been seen that the router data sent
    from the scheduler is missing this field in some cases.
    This causes the agent to throw a KeyError and not function
    properly. This patch will make the l3_agent more robust
    and less fragile by calling 'get' instead of assuming the
    field will be there.

    More work may be needed on the scheduler side to see why
    this field is missing. That is why I am marking this as a
    partial-fix for now. But this patch will make the l3_agent
    less prone to errors and therefore an improvement.

    Change-Id: Ib26ccfa7b945cb4e8f2ec4adc5e6ae91cbaae02e
    Partial-Bug: #1394043

Revision history for this message
Mike Smith (michael-smith6) wrote :

Update - the condition of the gw_port_host missing from the router data sent to the agent can be reproduced in a simple setup of one router, 2 nova VMs, 2 networks/subnets, plus one external network. I ran a script using this basic setup and I was able to see the agent log "gw_port_host missing from router" appear (it took less than 10 runs to repro).

The cause of this might be when the agent requests a sync_router RPC, the gw_port_host may be gone already.

Stephen Ma (stephen-ma)
tags: added: juno-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/147695

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/148758

Changed in neutron:
assignee: Mike Smith (michael-smith6) → Stephen Ma (stephen-ma)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/147695
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=42630f9b9e62d6a2199b1ac4f7f4aa61bd636066
Submitter: Jenkins
Branch: stable/juno

commit 42630f9b9e62d6a2199b1ac4f7f4aa61bd636066
Author: Michael Smith <email address hidden>
Date: Tue Dec 2 14:22:04 2014 -0800

    Fix for KeyError: 'gw_port_host' on l3_agent

    The dictionary field 'gw_port_host' was added for
    DVR routers and is used by the scheduler and l3_agent
    to schedule where the SNAT port for a DVR router
    will be hosted. In some code flows on the l3_agent,
    this field is checked to determine what the agent
    should do if the host matches its own or not.

    Recently it has been seen that the router data sent
    from the scheduler is missing this field in some cases.
    This causes the agent to throw a KeyError and not function
    properly. This patch will make the l3_agent more robust
    and less fragile by calling 'get' instead of assuming the
    field will be there.

    More work may be needed on the scheduler side to see why
    this field is missing. That is why I am marking this as a
    partial-fix for now. But this patch will make the l3_agent
    less prone to errors and therefore an improvement.

    (cherry-picked from 5908a60ab98886a807044da56f51bceafd15edc9)
    Partial-Bug: #1394043
    Change-Id: Ib26ccfa7b945cb4e8f2ec4adc5e6ae91cbaae02e

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by stephen-ma (<email address hidden>) on branch: master
Review: https://review.openstack.org/148758

Stephen Ma (stephen-ma)
Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Kilo is not released yet.

Changed in neutron:
status: Fix Released → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.