neutron-fwaas iptables driver performs awful
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
In Progress
|
Undecided
|
Simon Murray |
Bug Description
Details
=======
From Liberty onwards the neutron iptables_manager started using difflib.ndiff to check for changes to iptables rules. This function does way more than it needs to and takes exponential time, so by the time you have enough rules the l3-agent eats 100% CPU and suffers from RPC timeouts effectively taking the agent down.
There are a couple elements in play here. First the generation of iptables rules in neutron-fwass is utterly wrong guaranteeing that every line diffed will be different. My first patch addresses this issue ensuring fields are correctly ordered, sub-fields within lines are correctly ordered, source and destination prefixes are normalized.
As an experiment I increased firewall rules from 1 -> 1024 (2^N progression) to measure the damage in devstack. The time complexity was roughly 0.0045 x N^2. At 512 (~120s) we started getting timeouts and had the l3 agent spinning constantly broken unable to apply the configuration.
Applying the fixes to the formatting reduced this to 0.004521s for 512 entries. I suspect looking at the code for ndiff this will equate to roughly 65536 firewall rules before the system keels over again.
The second issue is with ndiff itself. If we end up in this situation again where all lines are different it correctly discovers this in O(N^2) time, however it also tries to diff pairs of lines that look alike. This is utterly superfluous to the algorithm in neutron and basically causes a huge performance penalty. Given we've been throwing away the whole ruleset and reinstalling it each time for 2 years you may as well replace it with an O(N) list compare ;D The other thing would be to parse iptables output into an internal representation and compare those which are not at the whimsical mercy of a 3rd party.
Version
=======
Liberty -> Present
Severity
========
High - we have customers with over 1500 rules, and having them able to DoS our L3 network service is not great
Fix proposed to branch: master /review. openstack. org/462953
Review: https:/