No need to autoreschedule routers if l3 agent is back online

Bug #1523479 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
Medium
Oleg Bondarev
9.x
Fix Released
Medium
Oleg Bondarev

Bug Description

- in case l3 agent goes offline the auto-rescheduling task is triggered and starts to reschedule each router from dead agent one by one
 - If there are a lot of routers scheduled to the agent, rescheduling all of them might take some time
 - during that time the agent might get back online
 - currently autorescheduling will be continued until all routers are rescheduled from the (already alive!) agent

The proposal is to skip rescheduling if agent is back online.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/14462

Changed in mos:
status: New → In Progress
Changed in mos:
status: In Progress → Confirmed
milestone: none → 8.0
Changed in mos:
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/14462
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: 3ad6c2f2c7f18407b9a5582fabad170677f566bf
Author: Oleg Bondarev <email address hidden>
Date: Mon Dec 7 12:58:34 2015

Do not autoreschedule routers if l3 agent is back online

If there are a lot of routers scheduled to l3 agent,
rescheduling all of them one by one might take quite a long
period of time - during that time some agents might get back
online. In this case we should skip rescheduling.

Upstream review: https://review.openstack.org/252980

Closes-Bug: #1523479
Closes-Bug: #1522436
Change-Id: If6df1f2878ea3379e8d2dba431de3e358e40189d

Changed in mos:
status: In Progress → Fix Committed
tags: added: scale
tags: added: area-neutron
removed: neutron
Revision history for this message
Sergey Arkhipov (sarkhipov) wrote : Re: [Backport 1522436] No need to autoreschedule routers if l3 agent is back online

Problem was reproduced on MOS 8.0, build #496

Steps to reproduce (having 3 controllers):

1. Create ~200-300 routers and connect them to some external gateway
   for i in {1..300}; do
       neutron router-create --distributed False --ha False rbug-$i && \
       neutron router-gateway-set rbug-$i my_ext_network
   done

2. Stop all L3 agents except of one (let's say, it belongs to node-1)
   $ pcs resource ban p_neutron-l3-agent node-2.domain.tld
   $ pcs resource ban p_neutron-l3-agent node-3.domain.tld

3. Wait till all routers will be migrated to node-1. You may check that with `neutron router-list-on-l3-agent`

4. Enable all other L3 agents.
   $ pcs resource clear p_neutron-l3-agent node-2.domain.tld
   $ pcs resource clear p_neutron-l3-agent node-3.domain.tld

5. Check that nothing has been happening with routers, they are connected to L3 agent on `node-1`.

6. Stop L3 agent on node-1:
   $ pcs resource ban p_neutron-l3-agent node-1.domain.tld

7. Wait till routers will start to migrate to "live" L3 agents. After that IMMEDIATELY enable L3 agent on node-1:
   $ pcs resource clear p_neutron-l3-agent node-1.domain.tld

8. Ensure that rest of routers will stay on node-1 and "drain" will eventually stopped.

In reality, all routers will leave node-1.

Please check attached logs. You may observe this usecase (from step 6 ~ 08 Feb 2016 14:15:28 UTC) here:
https://drive.google.com/a/mirantis.com/file/d/0B9tzODpFABxkN2R1bHhyZVloelk/view?usp=sharing

Changed in mos:
milestone: 8.0 → 9.0
status: Fix Committed → Confirmed
summary: - [Backport 1522436] No need to autoreschedule routers if l3 agent is back
- online
+ No need to autoreschedule routers if l3 agent is back online
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Oleg will try his best to fix this issue again!

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Upstream fix https://review.openstack.org/#/c/300571 will be backported to stable/mitaka once merged in master

tags: added: wait-for-stable
Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Dina Belova (dbelova) wrote :

Adding 9.0 milestone back due to the wait-for-stable tag -> fix will arrive as a part of sync with stable/mitaka code.

Changed in mos:
status: Won't Fix → Confirmed
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/19866
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: ed1ca7dcfcc89fc19384dbbf2174ae0f6795289b
Author: Jenkins <email address hidden>
Date: Wed Apr 20 08:40:40 2016

Merge the tip of origin/stable/mitaka into origin/9.0/mitaka

643b443 Imported Translations from Zanata
1ffea42 Updated from global requirements
b970ed5 Clear DVR MAC on last agent deletion from host
eee9e58 Add an option for WSGI pool size
93795a4 Fix deprecation warning for external_network_bridge
36305c0 Add ALLOCATING state to routers
07fa372 ADDRESS_SCOPE_MARK_IDS should not be global for L3 agent
9c58ae6 Wrap all update/delete l3_rpc handlers with retries
ece192b Use new DB context when checking if agent is online during rescheduling
2e2d75c ovsfw: Load vlan tag from other_config
5853af9 Iptables firewall prevent IP spoofed DHCP requests
9679285 Return oslo_config Opts to config generator
e2676ae DVR: rebind port if ofport changes

Closes-Bug: #1566689
Closes-Bug: #1496723
Closes-Bug: #1523479
Closes-Bug: #1561509

Change-Id: Id18fd3ba2fa15369748828c462e8e888ccecc0de

tags: added: on-verification
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Fix is landed in Newton so closing this bug as Invalid for 10.0.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.