No need to autoreschedule routers if l3 agent is back online

Bug #1522436 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Oleg Bondarev

Bug Description

 - in case l3 agent goes offline the auto-rescheduling task is triggered and starts to reschedule each router from dead agent one by one
 - If there are a lot of routers scheduled to the agent, rescheduling all of them might take some time
 - during that time the agent might get back online
 - currently autorescheduling will be continued until all routers are rescheduled from the (already alive!) agent

The proposal is to skip rescheduling if agent is back online.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/252980

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/252980
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9ec466cd42de94c2fad091edfd7583a5f47eb87a
Submitter: Jenkins
Branch: master

commit 9ec466cd42de94c2fad091edfd7583a5f47eb87a
Author: Oleg Bondarev <email address hidden>
Date: Thu Dec 3 17:39:20 2015 +0300

    Do not autoreschedule routers if l3 agent is back online

    If there are a lot of routers scheduled to l3 agent,
    rescheduling all of them one by one might take quite a long
    period of time - during that time some agents might get back
    online. In this case we should skip rescheduling.

    Closes-Bug: #1522436
    Change-Id: If6df1f2878ea3379e8d2dba431de3e358e40189d

Changed in neutron:
status: In Progress → Fix Committed
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/259523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/259523
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=745b546b8605d68e074bf53275f5cc8863ee885c
Submitter: Jenkins
Branch: stable/liberty

commit 745b546b8605d68e074bf53275f5cc8863ee885c
Author: Oleg Bondarev <email address hidden>
Date: Thu Dec 3 17:39:20 2015 +0300

    Do not autoreschedule routers if l3 agent is back online

    If there are a lot of routers scheduled to l3 agent,
    rescheduling all of them one by one might take quite a long
    period of time - during that time some agents might get back
    online. In this case we should skip rescheduling.

    Closes-Bug: #1522436
    Change-Id: If6df1f2878ea3379e8d2dba431de3e358e40189d
    (cherry picked from commit 9ec466cd42de94c2fad091edfd7583a5f47eb87a)

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b2

This issue was fixed in the openstack/neutron 8.0.0.0b2 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.2

This issue was fixed in the openstack/neutron 7.0.2 release.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

It appeared the fix was not complete. I'm reopening the bug, will upload a fix shortly

Changed in neutron:
status: Fix Released → Triaged
tags: removed: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/300571

Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/300571
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=70068992e37c80e9aa8e70f017aa35132d7e5aee
Submitter: Jenkins
Branch: master

commit 70068992e37c80e9aa8e70f017aa35132d7e5aee
Author: Oleg Bondarev <email address hidden>
Date: Fri Apr 1 19:40:20 2016 +0300

    Use new DB context when checking if agent is online during rescheduling

    Commit 9ec466cd42de94c2fad091edfd7583a5f47eb87a was not enough
    since it checked l3 agents in the same transaction that was used
    to fetch down bindings - so agents always were down even if actually
    they went back online.
    This commit adds context creation on each iteration to make sure
    we use new transaction and fetch up-to-date info for the agent.

    Closes-Bug: #1522436
    Change-Id: I12a4e4f4e0c2042f0c0bf7eead42baca7b87a22b

Changed in neutron:
status: In Progress → Fix Released
tags: added: mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/304393

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/304393
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ece192b056c36568238abaf763b0ea9f99877c4c
Submitter: Jenkins
Branch: stable/mitaka

commit ece192b056c36568238abaf763b0ea9f99877c4c
Author: Oleg Bondarev <email address hidden>
Date: Fri Apr 1 19:40:20 2016 +0300

    Use new DB context when checking if agent is online during rescheduling

    Commit 9ec466cd42de94c2fad091edfd7583a5f47eb87a was not enough
    since it checked l3 agents in the same transaction that was used
    to fetch down bindings - so agents always were down even if actually
    they went back online.
    This commit adds context creation on each iteration to make sure
    we use new transaction and fetch up-to-date info for the agent.

    Closes-Bug: #1522436
    Change-Id: I12a4e4f4e0c2042f0c0bf7eead42baca7b87a22b
    (cherry picked from commit 70068992e37c80e9aa8e70f017aa35132d7e5aee)

tags: added: in-stable-mitaka
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 8.1.0

This issue was fixed in the openstack/neutron 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/314250

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/321755

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)
Download full text (36.9 KiB)

Reviewed: https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch: master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <email address hidden>
Date: Tue Oct 20 15:51:37 2015 +0000

    Revert "Improve performance of ensure_namespace"

    This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.

    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.

    This also

    Closes-bug: #1574881

    Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 21 07:05:16 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Apr 19 06:52:48 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 18 15:03:30 2016 +0000

    Updated from global requirements

    Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <email address hidden>
Date: Fri Apr 8 17:52:14 2016 -0700

    Clear DVR MAC on last agent deletion from host

    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.

    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <email address hidden>
Date: Tue Feb 9 13:10:57 2016 -0500

    Add an option for WSGI pool size

    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html

    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.

    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.

    DocImpact: a previously unused configuration value
               wsgi_default_pool_size is now used to a...

tags: added: neutron-proactive-backport-potential
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b1

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/321755
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=85b29358765c202be4998dc88f4ac5b1f2861bf3
Submitter: Jenkins
Branch: stable/liberty

commit 85b29358765c202be4998dc88f4ac5b1f2861bf3
Author: Oleg Bondarev <email address hidden>
Date: Fri Apr 1 19:40:20 2016 +0300

    Use new DB context when checking if agent is online during rescheduling

    Commit 9ec466cd42de94c2fad091edfd7583a5f47eb87a was not enough
    since it checked l3 agents in the same transaction that was used
    to fetch down bindings - so agents always were down even if actually
    they went back online.
    This commit adds context creation on each iteration to make sure
    we use new transaction and fetch up-to-date info for the agent.

    Closes-Bug: #1522436
    Change-Id: I12a4e4f4e0c2042f0c0bf7eead42baca7b87a22b
    (cherry picked from commit 70068992e37c80e9aa8e70f017aa35132d7e5aee)
    (cherry picked from commit ece192b056c36568238abaf763b0ea9f99877c4c)

tags: added: in-stable-liberty
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.1.2

This issue was fixed in the openstack/neutron 7.1.2 release.

tags: removed: neutron-proactive-backport-potential
tags: removed: mitaka-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.