[Backport 1566291] L3 agent: at some point an agent becomes unable to handle new routers

Bug #1566689 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Oleg Bondarev

Bug Description

Upstream: https://bugs.launchpad.net/neutron/+bug/1566291

Following seen in l3 agent logs:

2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent [-] Failed to process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f'
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 497, in _process_router_update
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 434, in _process_router_if_compatible
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._process_added_router(router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 439, in _process_added_router
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent self._router_added(router['id'], router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 340, in _router_added
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent ri = self._create_router(router_id, router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 337, in _create_router
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent return legacy_router.LegacyRouter(*args, **kwargs)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 61, in __init__
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent DEFAULT_ADDRESS_SCOPE: ADDRESS_SCOPE_MARK_IDS.pop()}
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent KeyError: 'pop from an empty set'
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent

So agent is constantly resyncing (causing load on neutron server) and unable to handle new routers.

I believe that set "ADDRESS_SCOPE_MARK_IDS = set(range(1024, 2048))" from router_info.py should not be agent global but it should be ADDRESS_SCOPE_MARK_IDS per router. Or at least need to return values back to the set when router is deleted.

Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
tags: added: wait-for-stable
Changed in mos:
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/19866
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: ed1ca7dcfcc89fc19384dbbf2174ae0f6795289b
Author: Jenkins <email address hidden>
Date: Wed Apr 20 08:40:40 2016

Merge the tip of origin/stable/mitaka into origin/9.0/mitaka

643b443 Imported Translations from Zanata
1ffea42 Updated from global requirements
b970ed5 Clear DVR MAC on last agent deletion from host
eee9e58 Add an option for WSGI pool size
93795a4 Fix deprecation warning for external_network_bridge
36305c0 Add ALLOCATING state to routers
07fa372 ADDRESS_SCOPE_MARK_IDS should not be global for L3 agent
9c58ae6 Wrap all update/delete l3_rpc handlers with retries
ece192b Use new DB context when checking if agent is online during rescheduling
2e2d75c ovsfw: Load vlan tag from other_config
5853af9 Iptables firewall prevent IP spoofed DHCP requests
9679285 Return oslo_config Opts to config generator
e2676ae DVR: rebind port if ofport changes

Closes-Bug: #1566689
Closes-Bug: #1496723
Closes-Bug: #1523479
Closes-Bug: #1561509

Change-Id: Id18fd3ba2fa15369748828c462e8e888ccecc0de

Elena Ezhova (eezhova)
Changed in mos:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Verify on 9.0
cat /etc/fuel_build_id:
 389
cat /etc/fuel_build_number:
 389
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6346.noarch
 fuel-bootstrap-cli-9.0.0-1.mos282.noarch
 fuel-migrate-9.0.0-1.mos8378.noarch
 rubygem-astute-9.0.0-1.mos745.noarch
 fuel-misc-9.0.0-1.mos8378.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-mirror-9.0.0-1.mos136.noarch
 fuel-openstack-metadata-9.0.0-1.mos8693.noarch
 fuel-notify-9.0.0-1.mos8378.noarch
 nailgun-mcagents-9.0.0-1.mos745.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8693.noarch
 python-fuelclient-9.0.0-1.mos315.noarch
 fuelmenu-9.0.0-1.mos270.noarch
 fuel-9.0.0-1.mos6346.noarch
 fuel-utils-9.0.0-1.mos8378.noarch
 fuel-setup-9.0.0-1.mos6346.noarch
 fuel-library9.0-9.0.0-1.mos8378.noarch
 shotgun-9.0.0-1.mos88.noarch
 fuel-agent-9.0.0-1.mos282.noarch
 fuel-ui-9.0.0-1.mos2696.noarch
 fuel-ostf-9.0.0-1.mos934.noarch
 python-packetary-9.0.0-1.mos136.noarch
 fuel-nailgun-9.0.0-1.mos8693.noarch
(neutron+vxlan+l2+dvr, 1 controller, 2 compute)

Reproduce on 9.0 134 iso

Steps:
1) Create and delete 1024 routers on one agent (I have only one controller, so it's not a problem)
for i in {1..1024}
do
neutron router-create router-$i
NET_ID=$(neutron net-create net$i | grep id | awk -F "|" '{print $3}' | head -1 | tr -d " ")
neutron subnet-create --name subnet$i net$i 10.0.1.0/24
neutron router-interface-add router-$i subnet$i
sleep 5
neutron router-gateway-clear router-$i admin_floating_net
neutron router-interface-delete router-$i subnet$i
neutron router-delete router-$i
neutron net-delete net$i
sleep 5
done
2) Create one more router, net, subnet, set gateway and add interface to router and net
3) Check /var/log/neutron/l3-agent.log for errors
On iso 134 we can see this errors: https://paste.mirantis.net/show/2309/
On iso 389 all work correct without errors

Changed in mos:
status: Fix Committed → Fix Released
tags: removed: on-verification
Revision history for this message
Alexander Ignatov (aignatov) wrote :

This bug is fixed in Newton upstream so closing it as Invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.