[RFE] Improve performance of bulk port create/delete with networking-generic-switch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned |
Bug Description
We have a bare-metal use-case where we plan to use Neutron and networking-
However, we are facing a performance issue when creating or deleting many ports at once, even when using Neutron's bulk API for creating ports. The main issue is that Neutron binds ports sequentially, and networking-
We have ideas to improve NGS itself (SSH connection reuse, parallelism across different switches), but the main blocker is the sequential interface between Neutron and NGS. As a result, Neutron calls NGS with only one port to bind at a time, and then waits for the bind to succeed before calling again NGS with the next port to bind.
Here are some solutions we could think about, as a basis for discussion:
- add a new "bind_port_bulk()" interface between Neutron and mechanism drivers, so that Neutron could pass all ports at once to NGS, and NGS could then parallelize hardware reconfiguration. We started working on this on the Neutron side, but it's quite invasive and difficult to do. Pain points include error handling, atomicity, and interaction with binding levels.
- add a way to bind ports asynchronously. Neutron could fire several asynchronous bind calls to the mechanism driver, which would be handled simultaneously by the driver (with appropriate asynchronous support). Neutron would then "join" all the asynchronous threads before completing the API call.
Some news on this front, after some more work and informal discussion:
- currently in NGS, device reconfiguration to account for port creation is done in bind_port(), while port deletion is handled in a postcommit hook. According to the bind_port() specification, we are not supposed to make permanent changes here, which means that the current code does not respect this for port creation. I am preparing an initial patch to move all device reconfiguration to postcommit hooks to fix this. This has the added advantage that postcommit hooks would be easier to parallelize / bulkify / make asynchronous: there is none of the complexity of binding levels or binding retries done by Neutron.
- I discovered the "provisioning blocks" mechanism [1]. It is already used by NGS, but it could probably be used in a more asynchronous way to improve concurrency (e.g. return an answer to the API caller even if the port is still in BUILD state).
[1] https:/ /docs.openstack .org/neutron/ latest/ contributor/ internals/ provisioning_ blocks. html