[RFE] Firewall configuration takes a long time with many ports
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ironic Inspector |
Triaged
|
Medium
|
milan k |
Bug Description
When firewall management is enabled, each (inactive) ironic port requires an iptables rule to filter DHCP packets. These rules get recreated each time inspection is triggered on a node, and when inspection completes. When a lot of ports are in use, configuration of these iptables rules can take a long time.
In a system with 35 nodes each with 6 ports, each firewall change was observed to take around 50 seconds, leading to inspection processing times of around a minute per node. Coupled with the single threaded nature of the Flask development server used in inspector, and the global lock acquired during firewall reconfiguration, we saw a large percentage of inspection failures when multiple nodes were inspected simultaneously.
Steps to reproduce
------------------
Perform inspection on multiple nodes simultaneously in a system with a lot of ports, and inspector firewall management enabled. This could likely be reproduced using a few nodes with many ports each.
Expected results
----------------
Inspection is successful for all nodes.
Actual results
--------------
A large percentage of nodes fails inspection. HTTP 504 gateway timeout errors are seen both in ironic when checking the inspection status, and in IPA when calling back to inspector.
Ironic exception:
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
2017-12-12 17:50:04.679 6 ERROR ironic.
Inspector logs (note the timestamps)
2017-12-12 17:55:01.814 7 INFO ironic_
-b46d-9c55c2cb2558
2017-12-12 17:55:01.915 7 DEBUG ironic_
/site-packages/
2017-12-12 17:55:02.187 7 DEBUG ironic_
/site-packages/
---8<------8<----
2017-12-12 17:56:35.228 7 DEBUG ironic_
/site-packages/
2017-12-12 17:56:35.475 7 DEBUG ironic_
2017-12-12 17:56:35.726 7 DEBUG ironic_
ite-packages/
2017-12-12 17:56:35.980 7 DEBUG ironic_
ackages/
2017-12-12 17:56:36.227 7 DEBUG ironic_
2017-12-12 17:56:36.471 7 DEBUG ironic_
2017-12-12 17:56:36.727 7 DEBUG ironic_
Environment
-----------
CentOS 7.4 host, CentOS 7.4 inspector container running ironic-
Linux sv-b16-u21 3.10.0-
x86_64 GNU/Linux
summary: |
- Firewall configuration takes a long time + Firewall configuration takes a long time with many ports |
Changed in ironic-inspector: | |
importance: | Undecided → Medium |
status: | In Progress → Triaged |
Hey Mark!
Thanks a lot for the report, performance feedback is really invaluable!
This is a legitimate issue and a limitation of the iptables filter driver (firewall module in Pike). _name__ ).run() call, probably enhancing the IO bound performance a bit.
The situation should be much better with the Queens new dnsmasq filter driver.
For the single-thread nature of the API (werkzeug's simple server) we can introduce a config option to propagate a threaded=True flag to the Flask(_
When it comes to the shared lock over the filter driver, we could probably avoid it too for the dnsmasq filter driver as the patch https:/ /review. openstack. org/#/c/ 504438/ just landed.
Cheers,
milan