neutron-fwaas iptables driver performs awful

Bug #1688573 reported by Simon Murray
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
In Progress
Undecided
Simon Murray

Bug Description

Details
=======
From Liberty onwards the neutron iptables_manager started using difflib.ndiff to check for changes to iptables rules. This function does way more than it needs to and takes exponential time, so by the time you have enough rules the l3-agent eats 100% CPU and suffers from RPC timeouts effectively taking the agent down.

There are a couple elements in play here. First the generation of iptables rules in neutron-fwass is utterly wrong guaranteeing that every line diffed will be different. My first patch addresses this issue ensuring fields are correctly ordered, sub-fields within lines are correctly ordered, source and destination prefixes are normalized.

As an experiment I increased firewall rules from 1 -> 1024 (2^N progression) to measure the damage in devstack. The time complexity was roughly 0.0045 x N^2. At 512 (~120s) we started getting timeouts and had the l3 agent spinning constantly broken unable to apply the configuration.

Applying the fixes to the formatting reduced this to 0.004521s for 512 entries. I suspect looking at the code for ndiff this will equate to roughly 65536 firewall rules before the system keels over again.

The second issue is with ndiff itself. If we end up in this situation again where all lines are different it correctly discovers this in O(N^2) time, however it also tries to diff pairs of lines that look alike. This is utterly superfluous to the algorithm in neutron and basically causes a huge performance penalty. Given we've been throwing away the whole ruleset and reinstalling it each time for 2 years you may as well replace it with an O(N) list compare ;D The other thing would be to parse iptables output into an internal representation and compare those which are not at the whimsical mercy of a 3rd party.

Version
=======
Liberty -> Present

Severity
========
High - we have customers with over 1500 rules, and having them able to DoS our L3 network service is not great

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-fwaas (master)

Fix proposed to branch: master
Review: https://review.openstack.org/462953

Changed in neutron:
assignee: nobody → Simon Murray (simon-murray-q)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-fwaas (master)

Reviewed: https://review.openstack.org/462953
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=69d39f79fdd83acaf2d7846b4ed0de73e551aaf4
Submitter: Jenkins
Branch: master

commit 69d39f79fdd83acaf2d7846b4ed0de73e551aaf4
Author: Simon Murray <email address hidden>
Date: Fri May 5 15:04:18 2017 +0100

    Improve iptables handling

    The library routine diffutils.ndiff used to compare rulesets is O(N^2) complexity
    which means as we increase the number of rules performance suffers exponentially
    causing RPC timeouts, retries and essentially the L3 agent to not work. This
    alters the code for transforming neutron-fwaas rules into iptables so that the
    comparison between rules works most of the time rather than never. This prevents
    the ndiff function from trying to unnecessarily discover differences between
    individual lines which compunds the performance issue.

    Change-Id: I55a24491e91d8a360696d7e2bf3d088a3336f85e
    Partial-Bug: 1688573

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.