[OVN] Lack of AZs awareness in L3 port scheduler

Bug #2030741 reported by morice
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Rodolfo Alonso

Bug Description

The OVN L3 port scheduler assigns the router ports to gateway chassis. It retrieves the chassis list from nodes configured as gateway (external_ids:ovn-cms-options=enable-chassis-as-gw). This list could be filtered by availability zones. In this case, the scheduler will filter out chassis from invalid AZs (scheduler/l3_ovn_scheduler.py).

As a result, we have a list of all eligible chassis for gateway ports, in all AZs where it could be scheduled.

Then, both chance and leastloaded scheduler select 5 nodes from this list (hardcoded in common/ovn/constants.py:MAX_GW_CHASSIS = 5) regardless of AZs membership. Everything seems OK but when more than 5 nodes are available in one of the AZs, the gateway for a router can be scheduled in *only* one unique AZ.

In some use cases, where AZs are mapped to “failure domains”, this could be a problem. While in OVS l3_ha mode, router instances where placed by “neutron.scheduler.l3_agent_scheduler.AZ*Scheduler” taking care of AZs and so were their ports, this seems not to be feasible out-of-box - right now - using OVN.

Tags: ovn
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Morice:

The behaviour of the OVN L3 scheduler, with AZ filtering, that you are describing is correct. And this the expected behaviour: if the OVN L3 scheduler returns several ports (in any order depending on the scheduler) and this 5 ports [1] belong to the same AZ, the router will be scheduled to this single AZ. The OVN L3 scheduler won't distribute the ports among the available AZs. In that case, you can create a new OVN L3 scheduler class if that is what you need. The OVN L3 scheduler is configurable.

What does it mean "failure domains"? If you have GW chassis that should be disabled, then you should disabled them manually or remove the AZ tag from them. I would like to know what is the use case you need and what you are expecting from the scheduler.

Regards.

[1] As you correctly commented, there is a hardcoded limit of 5 router ports.

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: New → Incomplete
Revision history for this message
morice (yannmorice) wrote (last edit ):

Hello Rodolpho Alonso:

Thanks a lot for your returns.

By "Failure domains", I meant an equipment group (server, network hardware, power supply, etc...) that could be lost for any reason without affecting the other groups and consequently the service globally.

For example, our openstack nodes are distributed over multiple rooms in a datacenter. So that, in each room, we have some of our network-dedicated nodes. From this point of view, each room could be considered as a "failure domain". The goal is to have an automatic failover in case of any failure affecting one single room.

Until now, we used ML2/OVS. Nodes from room 1 were placed in the same AZ az-1, nodes from room2 in az-2... Using l3_ha mode and AZ aware schedulers we had router instances spawned on each of these AZs, so that we could lose any single zone without affecting the service (after vrrp timers of course).

For different reasons, we’d like to use OVN in the near future (and do the same). Out of the box, in our use case, using ML2/OVN, that should be OK until having no more than 5 * number of AZs network nodes. But, that may not be sufficient for us in the future as we already use slightly more.

Disabling or removing nodes should be feasible (even temporarily) but it won’t be automatic.

As you suggested, to deal with that and still use OVN, we wrote a little patch to add two new schedulers that optionally reorder the list taking care of AZs… and that works !

I share with you a refreshed version of this patch for trunk (we work on a previous version of openstack neutron that needs some other back-ports for that to work). The only draw-back is the need to call get_chassis_and_azs from sb_idl and therefore propagating it (again) along scheduler functions.

Please let me know if you have any questions.

Regards.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Morice:

Please, propose this patch to https://review.opendev.org. This is the best way to review it and get it approved. I have one concern about this new schedulers. They are "AZ aware", but the initial two ones "leastloaded" and "chance"), are aware too. That means: if these two current schedulers find AZ hits in the routers, they will use only those ones. Your implementation reorders the selected routers to distribute the LRP among the AZ (that is the goal, of course).

In the L3 agent scheduler (for L3 agent routers, non-OVN), we have an specific "AZLeastRoutersScheduler" and this is the only scheduler aware of AZs. As commented, this is not the same in the OVN schedulers.

In a nutshell, what we need is:
Alternative 1) To find a way to preserve the current scheduler behaviour but removing the "AZ awareness" and introduce your schedulers.
Alternative 2) Modify the current schedulers to introduce your AZ reorder algorithm.
Alternative 3) Something else...

This is something I would like to discuss in the Neutron meeting [1] next Tuesday. I'll add a "on demand" topic. But please, propose the patch to https://review.opendev.org.

Regards and thanks for your efforts.

[1]https://meetings.opendev.org/#Neutron_Team_Meeting

Changed in neutron:
status: Incomplete → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/892604

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/892604
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/906868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/892604
Committed: https://opendev.org/openstack/neutron/commit/a29ea3724e1f6bb54b76d1b9915c13014272fdcd
Submitter: "Zuul (22348)"
Branch: master

commit a29ea3724e1f6bb54b76d1b9915c13014272fdcd
Author: Yann Morice <email address hidden>
Date: Thu Aug 24 15:56:48 2023 +0200

    [ovn] AZs distribution in L3 port scheduler

    Update l3 ovn schedulers (chance, leastloaded) to ensure that LRP gateways are distributed over chassis in the different eligible AZs.

    Previous version already ensure that LRP gateways were scheduled over chassis in eligible AZs. But, depending on the deployment characteristics, all these chassis could be in the same AZ. In some use-cases, it could be needed to have LRP gateways in different AZs to be resilient on failures.

    This patch re-order the list of eligible chassis to add a priority on selecting chassis in different AZs.

    This should provide a solution for users who need to have their router gateways scheduled on chassis from different AZs.

    Closes-Bug: #2030741
    Change-Id: I72973abbb8b0f9cc5848fd3b4f6463c38c6595f8

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/906868
Committed: https://opendev.org/openstack/neutron/commit/df24fbeb48d8affed19f76efd409d9f6920637f8
Submitter: "Zuul (22348)"
Branch: master

commit df24fbeb48d8affed19f76efd409d9f6920637f8
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun Jan 21 08:42:45 2024 +0000

    [OVN] Document the OVN L3 scheduler: AZs distribution

    This new section describes how the OVN L3 schedulers distribute
    the ``Chassis`` candidate list among the Available Zones, in
    order to provide more resilience to the L3 HA: if the active
    LRP binding fails, the next in the list will belong to another
    AZ.

    Related-Bug: #2030741
    Change-Id: I20aaeefb33c424dc1a9c13f94f2912d0fa973166

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0rc1

This issue was fixed in the openstack/neutron 24.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.