[RFE] Improve SG performance as VMs/containers scale on compute node

Bug #1502297 reported by Ramu Ramamurthy
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Wishlist
Kevin Benton

Bug Description

Please refer to the comments in the following bug:

https://bugs.launchpad.net/neutron/+bug/1492456

In which it was suggested to handle improving SG programming performance as a RFE bug.

To Summarize the problem, when there are about 160 VMs, the neutron-ovs-agent takes more than 2 seconds per VM to program the iptables rules mainly because of the inefficiency in the iptables programming code.

#VMs = 0, provision 10 new VMs on compute node
cumulative time in _modify_rules : Before 0.34 After: 0.20

#VMs = 10, provision 10 new VMs on compute node
cumulative time in _modify_rules : Before 1.68 After: 0.94

#VMs = 20, provision 10 new VMs on compute node
cumulative time in _modify_rules : Before 4.27 After: 2.12

#VMs = 40, provision 10 new VMs on compute node
cumulative time in _modify_rules : Before 11.8 After: 6.44

#VMs = 80, provision 10 new VMs on compute node
cumulative time in _modify_rules : Before 20.2 After: 13.6

#VMs = 160, provision 10 new VMs on compute node
cumulative time in _modify_rules : Before 50 After: 23.2 <<<<< more than 2 seconds per VM !!!

description: updated
Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
status: New → In Progress
tags: added: loadimpact sg-fw
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
status: In Progress → Triaged
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am all in when it comes to improving performance, but I am not clear to what you're proposing. Please elaborate further...but yeah..if we can have SG performance kick ass, let's!

tags: added: rfe-approved
removed: rfe
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Miguel mentioned something about profiling etc. Please file a spec so we can raise the visibility of this effort.

Ryan Moats (rmoats)
tags: added: kilo-backport-potential liberty-backport-potential
Changed in neutron:
status: Triaged → In Progress
Changed in neutron:
importance: Medium → Wishlist
Changed in neutron:
milestone: none → mitaka-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/230750
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f066e46bb77362ccca838b541eb42c7ae8ddc125
Submitter: Jenkins
Branch: master

commit f066e46bb77362ccca838b541eb42c7ae8ddc125
Author: Kevin Benton <email address hidden>
Date: Sat Oct 3 07:25:19 2015 -0700

    Use diffs for iptables restore instead of all rules

    This patch changes our iptables logic to generate a delta of
    iptables commands (inserts + deletes) to get from the current
    iptables state to the new state. This will significantly reduce
    the amount of data that we have to shell out to iptables-restore
    on every call (and reduce the amount of data iptables-restore has
    to parse).

    We no longer have to worry about preserving counters since
    we are adding and deleting specific rules, so the rule modification
    code got a nice cleanup to get rid of the old rule matching.

    This also gives us a new method of functionally testing that we are
    generating rules in the correct manner. After applying new rules
    once, a subsequent call should always have no work to do. The new
    functional tests added leverage that property heavily and should
    protect us from regressions in how rules are formed.

    Performance metrics relative to HEAD~1:
    +====================================+============+=======+
    | Scenario | This patch | HEAD~1|
    |------------------------------------|------------|-------|
    | 200 VMs*22 rules existing - startup| | |
    | _modify_rules| 0.67s | 1.05s |
    | _apply_synchronized| 1.87s | 2.89s |
    |------------------------------------|------------|-------|
    | 200 VMs*22 rules existing - add VM | | |
    | _modify_rules| 0.68s | 1.05s |
    | _apply_synchronized| 2.07s | 2.92s |
    |------------------------------------+------------+-------+
    |200 VMs*422 rules existing - startup| | |
    | _modify_rules| 5.43s | 8.17s |
    | _apply_synchronized| 12.77s |28.00s |
    |------------------------------------|------------|-------|
    |200 VMs*422 rules existing - add VM | | |
    | _modify_rules| 6.41s | 8.33s |
    | _apply_synchronized| 33.09s |33.80s |
    +------------------------------------+------------+-------+

    The _apply_synchronized times seem to converge when dealing
    with ~85k rules. In the profile I can see that both approaches
    seem to wait on iptables-restore for approximately the same
    amount of time so it could be hitting the performance limits
    of iptables-restore.

    DocImpact
    Partial-Bug: #1502297
    Change-Id: Ia6470c85b6b71979006ffe5da9095fdcce3122c1

Revision history for this message
shihanzhang (shihanzhang) wrote : Re: Improve SG performance as VMs/containers scale on compute node

very good patch, hope it can be back port to kilo and liberty

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I doubt the patch is backportable. It's very invasive for what I see.

tags: removed: kilo-backport-potential liberty-backport-potential
Changed in neutron:
milestone: mitaka-1 → mitaka-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/255334

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/255334
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fb55693713991782a56006af73e0ee67cffe9a56
Submitter: Jenkins
Branch: stable/liberty

commit fb55693713991782a56006af73e0ee67cffe9a56
Author: Kevin Benton <email address hidden>
Date: Sat Oct 3 07:25:19 2015 -0700

    Use diffs for iptables restore instead of all rules

    This patch changes our iptables logic to generate a delta of
    iptables commands (inserts + deletes) to get from the current
    iptables state to the new state. This will significantly reduce
    the amount of data that we have to shell out to iptables-restore
    on every call (and reduce the amount of data iptables-restore has
    to parse).

    We no longer have to worry about preserving counters since
    we are adding and deleting specific rules, so the rule modification
    code got a nice cleanup to get rid of the old rule matching.

    This also gives us a new method of functionally testing that we are
    generating rules in the correct manner. After applying new rules
    once, a subsequent call should always have no work to do. The new
    functional tests added leverage that property heavily and should
    protect us from regressions in how rules are formed.

    Performance metrics relative to HEAD~1:
    +====================================+============+=======+
    | Scenario | This patch | HEAD~1|
    |------------------------------------|------------|-------|
    | 200 VMs*22 rules existing - startup| | |
    | _modify_rules| 0.67s | 1.05s |
    | _apply_synchronized| 1.87s | 2.89s |
    |------------------------------------|------------|-------|
    | 200 VMs*22 rules existing - add VM | | |
    | _modify_rules| 0.68s | 1.05s |
    | _apply_synchronized| 2.07s | 2.92s |
    |------------------------------------+------------+-------+
    |200 VMs*422 rules existing - startup| | |
    | _modify_rules| 5.43s | 8.17s |
    | _apply_synchronized| 12.77s |28.00s |
    |------------------------------------|------------|-------|
    |200 VMs*422 rules existing - add VM | | |
    | _modify_rules| 6.41s | 8.33s |
    | _apply_synchronized| 33.09s |33.80s |
    +------------------------------------+------------+-------+

    The _apply_synchronized times seem to converge when dealing
    with ~85k rules. In the profile I can see that both approaches
    seem to wait on iptables-restore for approximately the same
    amount of time so it could be hitting the performance limits
    of iptables-restore.

    DocImpact
    Partial-Bug: #1502297
    Change-Id: Ia6470c85b6b71979006ffe5da9095fdcce3122c1
    (cherry picked from commit f066e46bb77362ccca838b541eb42c7ae8ddc125)

tags: added: in-stable-liberty
Changed in neutron:
milestone: mitaka-2 → mitaka-3
Henry Gessau (gessau)
summary: - Improve SG performance as VMs/containers scale on compute node
+ [RFE] Improve SG performance as VMs/containers scale on compute node
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This can probably be closed, but Kevin should chime in on any minor performance improvement that may be left.

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Kevin Benton (kevinbenton) wrote :

There is one other optimization that we could possibly do where we keep track of which chains we modify and then use that to intelligently filter out the rules we are interesting in checking the changes on.

However, it's not clear how much performance it will gain us from the get-go so I would only like to dig into it if operators still feel the performance is not satisfactory.

Changed in neutron:
milestone: mitaka-3 → mitaka-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/400182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/400182
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e83d07d00a93ac3902025a0588d4f7c5b74dcd5c
Submitter: Jenkins
Branch: master

commit e83d07d00a93ac3902025a0588d4f7c5b74dcd5c
Author: Quan Tian <email address hidden>
Date: Mon Nov 21 18:37:38 2016 +0800

    Improve performance of _modify_rules

    The _modify_rules method currently uses nested loop to removes rules
    that belong to us but don't have the wrap name. Speed up this operation
    by storing our rules as set.
    Reduce operation complexity from O(n*m) to O(n).

    Change-Id: I82e6184a30ddb25f2258e21fe749573af44a52ca
    Related-Bug: #1502297

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.