[RFE] Firewall configuration takes a long time with many ports

Bug #1737947 reported by Mark Goddard
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic Inspector
Triaged
Medium
milan k

Bug Description

When firewall management is enabled, each (inactive) ironic port requires an iptables rule to filter DHCP packets. These rules get recreated each time inspection is triggered on a node, and when inspection completes. When a lot of ports are in use, configuration of these iptables rules can take a long time.

In a system with 35 nodes each with 6 ports, each firewall change was observed to take around 50 seconds, leading to inspection processing times of around a minute per node. Coupled with the single threaded nature of the Flask development server used in inspector, and the global lock acquired during firewall reconfiguration, we saw a large percentage of inspection failures when multiple nodes were inspected simultaneously.

Steps to reproduce
------------------

Perform inspection on multiple nodes simultaneously in a system with a lot of ports, and inspector firewall management enabled. This could likely be reproduced using a few nodes with many ports each.

Expected results
----------------

Inspection is successful for all nodes.

Actual results
--------------

A large percentage of nodes fails inspection. HTTP 504 gateway timeout errors are seen both in ironic when checking the inspection status, and in IPA when calling back to inspector.

Ironic exception:

2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector Traceback (most recent call last):
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/inspector.py", line 140, in _start_inspection
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector _get_client().introspect(node_uuid)
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/inspector.py", line 57, in _get_client
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector session=_get_inspector_session())
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic_inspector_client/v1.py", line 88, in __init__
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector super(ClientV1, self).__init__(**kwargs)
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic_inspector_client/common/http.py", line 137, in __init__
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector self._api_version = self._check_api_version(api_version)
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic_inspector_client/common/http.py", line 161, in _check_api_version
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector minv, maxv = self.server_api_versions()
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic_inspector_client/common/http.py", line 200, in server_api_versions
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector ClientError.raise_if_needed(res)
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector File "/usr/lib/python2.7/site-packages/ironic_inspector_client/common/http.py", line 69, in raise_if_needed
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector raise cls(response)
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector ClientError: <html><body><h1>504 Gateway Time-out</h1>
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector The server didn't respond in time.
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector </body></html>
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector
2017-12-12 17:50:04.679 6 ERROR ironic.drivers.modules.inspector

Inspector logs (note the timestamps)

2017-12-12 17:55:01.814 7 INFO ironic_inspector.process [-] [node: af85e966-1c66-45ce-b46d-9c55c2cb2558 state processing] Introspection data was stored in Swift in object inspector_data-af85e966-1c66-45ce
-b46d-9c55c2cb2558
2017-12-12 17:55:01.915 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-A', 'ironic-inspector_temp', '-m', 'mac', '--mac-source', u'24:6e:96:48:85:d5', '-j', 'DROP') _iptables /usr/lib/python2.7
/site-packages/ironic_inspector/firewall.py:43
2017-12-12 17:55:02.187 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-A', 'ironic-inspector_temp', '-m', 'mac', '--mac-source', u'24:6e:96:48:85:d2', '-j', 'DROP') _iptables /usr/lib/python2.7
/site-packages/ironic_inspector/firewall.py:43

---8<------8<----

2017-12-12 17:56:35.228 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-A', 'ironic-inspector_temp', '-m', 'mac', '--mac-source', u'24:6e:96:48:87:aa', '-j', 'DROP') _iptables /usr/lib/python2.7
/site-packages/ironic_inspector/firewall.py:43
2017-12-12 17:56:35.475 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-A', 'ironic-inspector_temp', '-j', 'ACCEPT') _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:43
2017-12-12 17:56:35.726 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-I', 'INPUT', '-i', 'breno1.7', '-p', 'udp', '--dport', '67', '-j', 'ironic-inspector_temp') _iptables /usr/lib/python2.7/s
ite-packages/ironic_inspector/firewall.py:43
2017-12-12 17:56:35.980 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-D', 'INPUT', '-i', 'breno1.7', '-p', 'udp', '--dport', '67', '-j', 'ironic-inspector') _iptables /usr/lib/python2.7/site-p
ackages/ironic_inspector/firewall.py:43
2017-12-12 17:56:36.227 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-F', 'ironic-inspector') _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:43
2017-12-12 17:56:36.471 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-X', 'ironic-inspector') _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:43
2017-12-12 17:56:36.727 7 DEBUG ironic_inspector.firewall [-] Running iptables ('-E', 'ironic-inspector_temp', 'ironic-inspector') _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:43

Environment
-----------

CentOS 7.4 host, CentOS 7.4 inspector container running ironic-inspector==6.0.0, deployed using kayobe & kolla-ansible. HAProxy in front of inspector, with client & server timeouts of 1 minute.

Linux sv-b16-u21 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64
x86_64 GNU/Linux

Tags: rfe
Mark Goddard (mgoddard)
summary: - Firewall configuration takes a long time
+ Firewall configuration takes a long time with many ports
Revision history for this message
milan k (vetrisko) wrote : Re: Firewall configuration takes a long time with many ports

Hey Mark!

Thanks a lot for the report, performance feedback is really invaluable!

This is a legitimate issue and a limitation of the iptables filter driver (firewall module in Pike).
The situation should be much better with the Queens new dnsmasq filter driver.
For the single-thread nature of the API (werkzeug's simple server) we can introduce a config option to propagate a threaded=True flag to the Flask(__name__).run() call, probably enhancing the IO bound performance a bit.

When it comes to the shared lock over the filter driver, we could probably avoid it too for the dnsmasq filter driver as the patch https://review.openstack.org/#/c/504438/ just landed.

Cheers,
milan

Changed in ironic-inspector:
status: New → Confirmed
summary: - Firewall configuration takes a long time with many ports
+ [RFE] Firewall configuration takes a long time with many ports
tags: added: rfe
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-inspector (master)

Fix proposed to branch: master
Review: https://review.openstack.org/527706

Changed in ironic-inspector:
assignee: nobody → milan k (vetrisko)
status: Confirmed → In Progress
Revision history for this message
Mark Goddard (mgoddard) wrote :

Thanks for the reply Milan. The multithreading option might help to some extent - it would at least allow ironic's polling to succeed, even if the /continue API still blocks on the firewall config.

I'd be interested to know how the threading used by werkzeug interacts with eventlet. Hopefully it would just work due to monkey patching.

As for the dnsmasq filter, I think we'll want to use that from queens.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic-inspector (master)

Change abandoned by Milan Kováčik (<email address hidden>) on branch: master
Review: https://review.openstack.org/527706

Dmitry Tantsur (divius)
Changed in ironic-inspector:
importance: Undecided → Medium
status: In Progress → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.