L3 HA Least loaded scheduling policy doesn't work as it always schedules gateways in the same node

Bug #1762694 reported by Daniel Alvarez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Undecided
Daniel Alvarez

Bug Description

With the current implementation, L3 HA Least Loaded scheduling policy would always schedule gateways in the same node as it's only taking into account the number of ports scheduled in every chassis and not their priorities.

For example, imagine that we have gateways GW1 and GW2 and chassis C1, C2 and C3.
At the moment GW1 is added, this code [0] will return [C1, C2, C3] as all of them have zero ports scheduled on them. This means that C1 will host GW1 with prio3, C2 with prio2 and C3 with prio1.

When GW2 is now added, the same code will return [C1, C2, C3] again as all of them have 1 port scheduled on them (although with different priorities). This means that C1 will host GW1 and GW2 with prio3, C2 with prio2 and C3 with prio1.

This would be repeated for every new gateway added to the cloud meaning that C1 will host all *active* routers for North-South traffic (as highest priority means that it's going to be active for BFD [1]) creating a very unbalanced state. Furthermore, if C1 goes down, all gateways will be failed over to C2 keeping the same unbalanced load.

We need to implement the Least Loaded algorithm in such way that we take the current balance and priorities into account so that the load is as balanced as possible even in the event that some network nodes go down.

[0] http://git.openstack.org/cgit/openstack/networking-ovn/tree/networking_ovn/l3/l3_ovn_scheduler.py?id=d40470a51314fc0c60353c9882e0d2d44c9d2aa5#n105
[1] https://docs.openstack.org/networking-ovn/latest/admin/routing.html

Changed in networking-ovn:
assignee: nobody → Daniel Alvarez (dalvarezs)
Changed in networking-ovn:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/559786
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=bbf0260496bf2e08db3c341b88e124fa6d0c7156
Submitter: Zuul
Branch: master

commit bbf0260496bf2e08db3c341b88e124fa6d0c7156
Author: Daniel Alvarez <email address hidden>
Date: Wed Apr 11 17:01:38 2018 +0200

    L3HA: Take priorities into account in least loaded scheduling

    This patch is implementing an algorithm to take the priorities of
    the gateway ports into account when choosing a chassis using the
    Least Loaded policy (default). Prior to this patch, the same chassis
    would always be selected leading to an unbalanced distribution
    of the gateway nodes.

    Now, when a new gateway is being scheduled, it will take into
    account the current priorities and will select the nodes and their
    priorities accordingly so that the system remains balanced even
    in the event that one of the nodes go down.

    Also, I'm adding a new functional test which will check that the
    ports have been scheduled as expected by looking directly into the
    Southbound database.

    Please, note that bug 1762691 could still make routers to not
    be in HA mode and we need to address this issue separately.

    Closes-Bug: 1762694
    Change-Id: I2df35cc31e856b9b574b77dc6b5ab5be262a12b7
    Signed-off-by: Daniel Alvarez <email address hidden>

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/561598

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.0.0.0b1

This issue was fixed in the openstack/networking-ovn 5.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.openstack.org/561598
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=a5d6044187193db3db93b7cdca29512cab20f7fb
Submitter: Zuul
Branch: stable/queens

commit a5d6044187193db3db93b7cdca29512cab20f7fb
Author: Daniel Alvarez <email address hidden>
Date: Wed Apr 11 17:01:38 2018 +0200

    L3HA: Take priorities into account in least loaded scheduling

    This patch is implementing an algorithm to take the priorities of
    the gateway ports into account when choosing a chassis using the
    Least Loaded policy (default). Prior to this patch, the same chassis
    would always be selected leading to an unbalanced distribution
    of the gateway nodes.

    Now, when a new gateway is being scheduled, it will take into
    account the current priorities and will select the nodes and their
    priorities accordingly so that the system remains balanced even
    in the event that one of the nodes go down.

    Also, I'm adding a new functional test which will check that the
    ports have been scheduled as expected by looking directly into the
    Southbound database.

    Please, note that bug 1762691 could still make routers to not
    be in HA mode and we need to address this issue separately.

    Closes-Bug: 1762694
    Change-Id: I2df35cc31e856b9b574b77dc6b5ab5be262a12b7
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit bbf0260496bf2e08db3c341b88e124fa6d0c7156)

tags: added: in-stable-queens
tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.2

This issue was fixed in the openstack/networking-ovn 4.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.3

This issue was fixed in the openstack/networking-ovn 4.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.