[RFE][L3] l3-agent should have its capacity

Bug #1828494 reported by LIU Yulong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Wishlist
LIU Yulong

Bug Description

Recently we meet some scale issue about L3-agent. According to what I'm informed, most cloud service provider does not charge for the neutron virtual router. This can become a headach for the operators. Every tenant may create free routers for doing nothing. But neutron will create many resource for it, especially the HA scenario, there will be namespaces, keepalived processes, and monitor processes. It will absolutely increase the failure risk, especially for agent restart.

So this RFE is aimming to add a scheduling mechanism, and for l3-agent, it will collect and report some resource usage, for instance available bandiwidth. So during the router scheduler process, if there is no more available, the number of routers can be under a controlled range.

Tags: rfe-approved
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

First of all, to control number of resources used by tenants, there is quota mechanism. So it can be limited easily.
Second - is this rfe about adding new scheduler driver to https://github.com/openstack/neutron/blob/master/neutron/scheduler/l3_agent_scheduler.py ? Isn't LeastRoutersScheduler (https://github.com/openstack/neutron/blob/master/neutron/scheduler/l3_agent_scheduler.py#L344) something what You are describing here?

tags: added: rfe
Revision history for this message
LIU Yulong (dragon889) wrote :

@Slawek,
Thanks for the feedback. I know quota. But for a public cloud, the user (tenant) may be free to register. Then they will have some free quota. The unrestricted growth of tenant number will let the router number or any other free resource reach a unbearable number of cluster.
For the second, yes, a scheduling mechanism based on some physical resource will be needed, for intance physical NIC bandwidth ratio. LeastRoutersScheduler can not work, since we do not have a maximum quantity for l3-agent. So router still is free to schedule to any l3-agent.

Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
status: New → In Progress
Revision history for this message
LIU Yulong (dragon889) wrote :
Revision history for this message
Miguel Lavalle (minsel) wrote :

Please see comments that I left in the spec

tags: added: rfe-confirmed
removed: rfe
Changed in neutron:
importance: Undecided → Wishlist
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661492

tags: added: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I read proposed spec again and I wonder if we can go with simpler solution like:

1. On L3 agent's side add optional config option "routers_max" (or something like that) and report that to server in it's state report: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L915
2. On server side add new scheduler which will be aware of this "routers_max" option and will not allow to schedule to an agent more routers than "routers_max" value is.

That way You would IMO address Your problem and it would be easier to implement and understand IMO. What do You think about it?

tags: added: rfe-triaged
removed: rfe-confirmed
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Ping Liu, are You still planing to work on this feature?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Liu we discussed this proposal once again on drivers team.
We would like to know what is Your opinion about proposal from comment #6 so base this scheduling simply on number of routers hosted by L3 agent instead of introducing new "bandwidth" parameter.

Revision history for this message
LIU Yulong (dragon889) wrote :

'routers_max' can be an alternative scheduling mechanism of the case, but it does not hit the main point of the L3 resource usage, aka bandwidth of L3 IPs. But I'm OK to make that 'number of routers hosted by L3 agent' as a first step. I will update the spec.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

We discussed that RFE once again in last drivers meeting: http://eavesdrop.openstack.org/meetings/neutron_drivers/2020/neutron_drivers.2020-06-26-14.00.log.html#l-54 and we decided to approve the RFE with "routers_max" parameter used to schedule routers.
Lets now discuss other details of this solution in the RFE and patch(es).

tags: added: rfe-approved
removed: rfe rfe-triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-specs (master)

Change abandoned by "liuyulong <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-specs/+/658451
Reason: Restore if someday we want this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "liuyulong <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/661492
Reason: Restore if someday we want this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.