[RFE] Improve performance of bulk port create/delete with networking-generic-switch

Bug #1976270 reported by Baptiste Jonglez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

We have a bare-metal use-case where we plan to use Neutron and networking-generic-switch (NGS) to reconfigure hardware switches, so that bare-metal hosts can be configured in VLANs managed by Neutron.

However, we are facing a performance issue when creating or deleting many ports at once, even when using Neutron's bulk API for creating ports. The main issue is that Neutron binds ports sequentially, and networking-generic-switch takes between 5s and 30s to bind a port (depending on the hardware). Our use-case involves creating up to 10-100 ports at once, which means waiting several minutes for the API call to complete: this is just too much.

We have ideas to improve NGS itself (SSH connection reuse, parallelism across different switches), but the main blocker is the sequential interface between Neutron and NGS. As a result, Neutron calls NGS with only one port to bind at a time, and then waits for the bind to succeed before calling again NGS with the next port to bind.

Here are some solutions we could think about, as a basis for discussion:

- add a new "bind_port_bulk()" interface between Neutron and mechanism drivers, so that Neutron could pass all ports at once to NGS, and NGS could then parallelize hardware reconfiguration. We started working on this on the Neutron side, but it's quite invasive and difficult to do. Pain points include error handling, atomicity, and interaction with binding levels.

- add a way to bind ports asynchronously. Neutron could fire several asynchronous bind calls to the mechanism driver, which would be handled simultaneously by the driver (with appropriate asynchronous support). Neutron would then "join" all the asynchronous threads before completing the API call.

Tags: loadimpact rfe
tags: added: rfe
tags: added: loadimpact
Revision history for this message
Baptiste Jonglez (bjonglez) wrote :

Some news on this front, after some more work and informal discussion:

- currently in NGS, device reconfiguration to account for port creation is done in bind_port(), while port deletion is handled in a postcommit hook. According to the bind_port() specification, we are not supposed to make permanent changes here, which means that the current code does not respect this for port creation. I am preparing an initial patch to move all device reconfiguration to postcommit hooks to fix this. This has the added advantage that postcommit hooks would be easier to parallelize / bulkify / make asynchronous: there is none of the complexity of binding levels or binding retries done by Neutron.

- I discovered the "provisioning blocks" mechanism [1]. It is already used by NGS, but it could probably be used in a more asynchronous way to improve concurrency (e.g. return an answer to the API caller even if the port is still in BUILD state).

[1] https://docs.openstack.org/neutron/latest/contributor/internals/provisioning_blocks.html

Revision history for this message
Lajos Katona (lajos-katona) wrote :

We discussed this RFE today on the drivers meeting:
https://meetings.opendev.org/meetings/neutron_drivers/2022/neutron_drivers.2022-06-24-14.00.log.html#l-21

Summary: the best would be to change networking-generic-switch to wait for the backend, and when the hw finished fill the provisioning info for the port (as ovs-agent is doing it for example) and Neutron can send VIF_Plugged based on that to other components. If during the change there is a need to change the driver interface we can discuss that and based on the specific request.

Revision history for this message
Baptiste Jonglez (bjonglez) wrote :

Here is the initial patch to make bind_port side-effect-free: https://review.opendev.org/c/openstack/networking-generic-switch/+/847592

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.