key error in del l3_agent.pd.routers[router['id']]['subnets']

Bug #1489576 reported by Adolfo Duarte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Assaf Muller

Bug Description

currently the following erorr is seen in the q-l3 or q-vpn agents when deleting a router: (steps to reproduce below):

Exit code: 0
Stdin:
Stdout: lo
Stderr: execute /opt/stack/neutron/neutron/agent/linux/utils.py:150
2015-08-27 19:08:38.007 12027 DEBUG neutron.agent.linux.utils [-]
Command: ['ip', 'netns', 'delete', u'qrouter-f53025cc-5159-480e-a7f3-19122aa330a3']
Exit code: 0
Stdin:
Stdout:
Stderr: execute /opt/stack/neutron/neutron/agent/linux/utils.py:150
2015-08-27 19:08:38.009 12027 DEBUG neutron.callbacks.manager [-] Notify callbacks for router, after_delete _notify_loop /opt/stack/neutron/neutron/callbacks/manager.py:133
2015-08-27 19:08:38.009 12027 DEBUG neutron.callbacks.manager [-] Calling callback neutron.agent.linux.pd.remove_router _notify_loop /opt/stack/neutron/neutron/callbacks/manager.py:140
2015-08-27 19:08:38.010 12027 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "remove_router" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:251
2015-08-27 19:08:38.010 12027 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "remove_router" :: held 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager [-] Error during notification for neutron.agent.linux.pd.remove_router router, after_delete
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager Traceback (most recent call last):
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager File "/opt/stack/neutron/neutron/callbacks/manager.py", line 141, in _notify_loop
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager callback(resource, event, trigger, **kwargs)
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager return f(*args, **kwargs)
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager File "/opt/stack/neutron/neutron/agent/linux/pd.py", line 307, in remove_router
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager del l3_agent.pd.routers[router['id']]['subnets']
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager KeyError: 'id'
2015-08-27 19:08:38.012 12027 ERROR neutron.callbacks.manager
2015-08-27 19:08:38.015 12027 DEBUG neutron.callbacks.manager [-] Calling callback neutron_vpnaas.services.vpn.vpn_service.router_removed_actions _notify_loop /opt/stack/neutron/neutron/callbacks/manager.py:140
2015-08-27 19:08:47.697 12027 DEBUG oslo_service.loopingcall [-] Fixed interval looping call 'neutron_vpnaas.services.vpn.agent.VPNAgent._report_state' sleeping for 30.00 seconds _run_loop /usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py:121
2015-08-27 19:08:47.756 12027 DEBUG oslo_service.loopingcall [-] Fixed interval looping call 'neutron.service.Service.report_state' sleeping for 30.00 seconds _run_loop /usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py:121

STEPS TO REPRODUCE:
create a stack with a neutron server (q-vpn) node, "network" server (q-l3), and a compute node.
Then execut the following:

neutron net-create public --router:external
neutron subnet-create public 123.0.0.0/24 --disable-dhcp
for id in $(neutron security-group-list | grep -v " id " | grep -v "\-\-" | awk '{print $2}'); do neutron security-group-delete $id; done
neutron security-group-rule-create --protocol icmp --direction ingress default
neutron security-group-rule-create --protocol tcp --port-range-min 1 --port-range-max 65535 --direction ingress default
neutron security-group-rule-create --protocol udp --port-range-min 1 --port-range-max 65535 --direction ingress default
neutron net-create private
neutron subnet-create private103.0.0.0/24 --name private

neutron router-create r1 --distributed=False --ha=False
neutron router-gateway-set r1 public

then execute:
neutron router-delete r1

you will see the error above.

Changed in neutron:
importance: Undecided → High
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

Any way we could get a better formatted stack trace in pastebin?

tags: added: l3-ipam-dhcp
Revision history for this message
Adolfo Duarte (adolfo-duarte) wrote :

Yeap. here we go: (removed timestamp to make it fit better :) )
Exit code: 0
Stdin:
Stdout: lo
Stderr: execute /opt/stack/neutron/neutron/agent/linux/utils.py:150
2015-08-27 19:08:38.007 12027 DEBUG neutron.agent.linux.utils [-]
Command: ['ip', 'netns', 'delete', u'qrouter-f53025cc-5159-480e-a7f3-19122aa330a3']
Exit code: 0
Stdin:
Stdout:
Stderr: execute /opt/stack/neutron/neutron/agent/linux/utils.py:150
12027 DEBUG neutron.callbacks.manager [-] Notify callbacks for router, after_delete _notify_loop /opt/stack/neutron/neutron/callbacks/manager.py:133
12027 DEBUG neutron.callbacks.manager [-] Calling callback neutron.agent.linux.pd.remove_router _notify_loop /opt/stack/neutron/neutron/callbacks/manager.py:140
12027 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "remove_router" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:251
12027 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" released by "remove_router" :: held 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
12027 ERROR neutron.callbacks.manager [-] Error during notification for neutron.agent.linux.pd.remove_router router, after_delete
 12027 ERROR neutron.callbacks.manager Traceback (most recent call last):
12027 ERROR neutron.callbacks.manager File "/opt/stack/neutron/neutron/callbacks/manager.py", line 141, in _notify_loop
ERROR neutron.callbacks.manager callback(resource, event, trigger, **kwargs)
 ERROR neutron.callbacks.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
12027 ERROR neutron.callbacks.manager return f(*args, **kwargs)
12027 ERROR neutron.callbacks.manager File "/opt/stack/neutron/neutron/agent/linux/pd.py", line 307, in remove_router
12027 ERROR neutron.callbacks.manager del l3_agent.pd.routers[router['id']]['subnets']
12027 ERROR neutron.callbacks.manager KeyError: 'id'
12027 ERROR neutron.callbacks.manager

Revision history for this message
Adolfo Duarte (adolfo-duarte) wrote :

sorry.. It looked better in the window before I submitted. here is the condense message for the backtrace:
Error during notification for neutron.agent.linux.pd.remove_router router, after_delete
 Traceback (most recent call last):
File "/opt/stack/neutron/neutron/callbacks/manager.py", line 141, in _notify_loop
callback(resource, event, trigger, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
return f(*args, **kwargs)
 File "/opt/stack/neutron/neutron/agent/linux/pd.py", line 307, in remove_router
del l3_agent.pd.routers[router['id']]['subnets']
KeyError: 'id'

It looks like by the time the call in line 307 to del l3_agent.pd.routers[router['id']]['subnets'] is made the "router" dict has no id key. I did some debugging and the dict at that poin only has a router id key/value pair. a few lines before there is a call to delete the router (ri.delete(self) ). An easy fix would be to check for the key in the router dict, but i am not sure what the purpose of the call is to begin with so it's hard to say if not having an "id" is expected.

Revision history for this message
Assaf Muller (amuller) wrote :

Caused by https://review.openstack.org/#/c/185977/.
Reproduced by running neutron.tests.functional.agent.test_l3_agent.L3AgentTestCase.test_legacy_router_lifecycle.
Formatted traceback: http://paste.openstack.org/show/430075/

The impact is that the l3_agent.pd.routers[router_id] entry is not deleted, so that any other actions that look up that key will see the old router value. I don't know if there's a valid flow where routers[router_id] is looked up after the router is deleted. The fix looks very straight forward: Either store the router_id in that dict, or better yet use the router_id value passed down to that function. Patch coming up.

Changed in neutron:
assignee: nobody → Assaf Muller (amuller)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/217927

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/217927
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a158d62f30d056ac2ab6bb30efd7fafc2b7a3153
Submitter: Jenkins
Branch: master

commit a158d62f30d056ac2ab6bb30efd7fafc2b7a3153
Author: Assaf Muller <email address hidden>
Date: Thu Aug 27 17:44:50 2015 -0400

    Fix Prefix delegation router deletion key error

    Change-Id: I1b7008bc6a9dd7d23a651023fd540eec3e46f072
    Closes-Bug: #1489576

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/218710

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (155.6 KiB)

Reviewed: https://review.openstack.org/218710
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2c5f44e1b3bd4ed8a0b7232fd293b576cc8c1c87
Submitter: Jenkins
Branch: feature/pecan

commit f35d1c5c50dccbef1a2e079f967b82f0df0e22e9
Author: Adelina Tuvenie <email address hidden>
Date: Thu Aug 27 02:27:28 2015 -0700

    Fixes wrong neutron Hyper-V Agent name in constants

    Change Id03fb147e11541be309c1cd22ce27e70fadc28b5 moved the
    AGENT_TYPE_HYPERV constant from common.constants to
    plugins.ml2.drivers.hyperv.constants but change the value of the
    constant from 'HyperV agent' to 'hyperv'. This patch changes
    the name back to 'HyperV agent'

    Change-Id: If74b4b2a84811e266c8b12e70bf6bfe74ed4ea21
    Partial-Bug: #1487598

commit de604de334854e2eb6b4312ff57920564cbd4459
Author: OpenStack Proposal Bot <email address hidden>
Date: Sun Aug 30 01:39:06 2015 +0000

    Updated from global requirements

    Change-Id: Ie52aa3b59784722806726e4046bd07f4a4d97328

commit f0415ac20eaf5ab4abb9bd4839bf6d04ceee85d0
Author: armando-migliaccio <email address hidden>
Date: Fri Aug 28 13:53:04 2015 -0700

    Revert "Add support for unaddressed port"

    This implementation may expose a vulnerability where a malicious
    user can sieze the opportunity of a time window where a port
    may land unaddressed on a shared network, thus allowing him/her
    to suck up all the tenant traffic he/she wants....oh the shivers.

    This reverts commit d4c52b7f5a36a103a92bf9dcda7f371959112292.

    Change-Id: I7ebdaa8d3defa80eab90e460fde541a5bdd8864c

commit 013fdcd2a6d45dbe4de5d6e7077e5e9b60985ef9
Author: Assaf Muller <email address hidden>
Date: Fri Aug 28 16:41:07 2015 -0400

    Improve logging upon failure in iptables functional tests

    This will help us nail down a more accurate and efficient logstash
    query.

    Change-Id: Iee4238e358f7b056e373c7be8d6aa3202117a680
    Related-Bug: #1478847

commit 622dea818d851224a43d5276a81d5ce8a6eebb76
Author: Ivar Lazzaro <email address hidden>
Date: Mon Aug 17 17:17:42 2015 -0700

    handle gw_info outside of the db transaction on router creation

    Move the gateway interface creation outside the DB transaction
    to avoid lock timeout.

    Change-Id: I5a78d7f32e8ca912016978105221d5f34618af19
    Closes-bug: 1485809

commit 5b27d290a0a95f6247fc5a0fe6da1e7d905e6b2d
Author: Assaf Muller <email address hidden>
Date: Wed Aug 26 10:07:03 2015 -0400

    Remove ml2 resource extension success logging

    This is the cause of a tremendous amount of logs, for no
    perceivable gain. A normal dvr run in the gate shows this debug
    message around 120K times, which is way too much.

    Closes-Bug: #1489952

    Change-Id: I26fca8515d866a7cc1638d07fa33bc04479ae221

commit 8d3faf549cba2f58c872ef4121b2481e73464010
Author: huangpengtao <email address hidden>
Date: Fri Aug 28 23:20:46 2015 +0800

    Replace "prt" variable by "port"

    the local variable prt is meaningless,
    and port is used popular.

    Change-Id: I20849102cf5b4d84433c46791b4b1e2a22dc4739

commit ee374e7a5f4dea538fcd942f5...

tags: added: in-feature-pecan
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-3 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.