adding rules to security groups is slow

Bug #1810563 reported by Doug Wiegley
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Doug Wiegley

Bug Description

Sometime between liberty and pike, adding rules to SG's got slow, and slower with every rule added.

Gerrit review with fixes is incoming.

You can repro with a vanilla devstack install on master, and this script:

#!/bin/bash

OPENSTACK_TOKEN=$(openstack token issue | grep '| id' | awk '{print $4}')
export OPENSTACK_TOKEN

CCN1=10.210.162.2
CCN3=10.210.162.10
export ENDPOINT=localhost

make_rules() {
    iter=$1
    prefix=$2
    file="$3"

    echo "generating rules"

    cat >$file <<EOF
{"security_group_rules":[
EOF

    comma=","
    i=0
    while [ $i -lt $iter ]; do
 j=0
 while [ $j -lt 10 ]; do
     if [ $i -eq $(($iter-1)) -a $j -eq 9 ]; then
  comma=""
     fi
     cat >>$file <<EOF
{"direction":"ingress","ethertype":"IPv4","port_range_max":10000,"port_range_min":8000,"protocol":"tcp"
,"remote_ip_prefix":"$prefix.$i.$j.0/24","security_group_id":"$SG_UUID"}${comma}
EOF
     j=$((j+1))
 done
 i=$((i+1))
    done

    cat >>$file <<EOF
]}
EOF
}

hit_api() {
    json="$1"

    echo "hitting api"

    start=$(perl -e "print time();")
    time curl --silent -g -i -X POST http://$ENDPOINT:9696/v2.0/security-group-rules.json -H "User-Agen
t: python-neutronclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-T
oken: $OPENSTACK_TOKEN" -d @${json} >/dev/null
    end=$(perl -e "print time();")
    echo $((end-start))
}

tmp=/tmp/sg-test.$$.tmp

echo "Doing test with 1000 rules in bulk"
openstack security group delete dw-test-1
uuid=$(openstack security group create dw-test-1 | grep '| id' | awk '{print $4}')
export SG_UUID="$uuid"
make_rules 100 4 $tmp
hit_api $tmp

echo "Doing loop test"
openstack security group delete dw-test-2
uuid=$(openstack security group create dw-test-2 | grep '| id' | awk '{print $4}')
export SG_UUID="$uuid"
elapsed=0
mm=0
while [ $mm -lt 20 ]; do
    make_rules 5 $(($mm+1)) $tmp
    n=$(hit_api $tmp | tail -1)
    elapsed=$((elapsed+n))
    mm=$((mm+1))
done
echo "Loop test took $elapsed seconds"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/628691

Changed in neutron:
assignee: nobody → Doug Wiegley (dougwig)
status: New → In Progress
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Do You have some data with time comparison for master branch and some stable branches?
It would be also good if You can maybe propose patch with new rally scenario which could prove later that Your fix for this issue actually works

Changed in neutron:
importance: Undecided → Medium
tags: added: api sg-fw
Revision history for this message
Doug Wiegley (dougwig) wrote :

Let's be clear here: this is a major performance *regression*. To the point that it's effectively unusable with a large number of rules.

I've got comparisons from two production clouds, one liberty, one pike.

The use case is a security group with 2000 rules, added 50 at a time. The final chunk of 50 takes about 12 seconds liberty, and over 5 minutes on pike (the job doing this times out).

With this change, using a vanilla devstack, and two use cases:

1. Adding 1000 rules in one shot, to an empty SG, and
2. Adding 2000 rules to an SG, 50 at a time,

I get these numbers. On stock master:

1. over 30 minutes
2. Hours and hours... didn't let it finish.

After this change:

1. 1m20s
2. 127 seconds

Still not great, but back on par with liberty.

Is there already a rally job testing API responses? If so, I can add to that. If not, we should really consider making a comprehensive api performance regression job, not just one-offing, IMO.

The fix above doesn't address single rule creates, which are likely just as hosed, too. But I'm not out to boil the ocean here, just make things a little better.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx for more detailed info.

About rally job, we have job called neutron-rally-task which is running API calls many times and check execution times for them, see e.g. http://logs.openstack.org/91/628691/9/check/neutron-rally-task/728d6c1/results/report.html.gz

Revision history for this message
Doug Wiegley (dougwig) wrote :

Ahh, thank you. And it's missing a security group test. :)

Revision history for this message
Doug Wiegley (dougwig) wrote :

There's also a get/index regression, when you limit the fields returned. Remove the security group delete from the script above, run it a few times, then run this script to see it. Patch updated to fix both, since they're related:

#!/bin/bash

OPENSTACK_TOKEN=$(openstack token issue | grep '| id' | awk '{print $4}')
export OPENSTACK_TOKEN

CCN1=10.210.162.2
CCN3=10.210.162.10
export ENDPOINT=localhost

hit_api() {
    json="$1"

    echo "hitting api"

    echo "all"
    start=$(perl -e "print time();")
    time curl -s -H "X-Auth-Token: $OPENSTACK_TOKEN" "http://$ENDPOINT:9696/v2.0/security-groups" #> /dev/null
    end=$(perl -e "print time();")
    echo $((end-start))

    echo "ids"
    start=$(perl -e "print time();")
    time curl -s -H "X-Auth-Token: $OPENSTACK_TOKEN" "http://$ENDPOINT:9696/v2.0/security-groups?fields=id" > /dev/null
    end=$(perl -e "print time();")
    echo $((end-start))

    echo "one"
    start=$(perl -e "print time();")
    time curl -s -H "X-Auth-Token: $OPENSTACK_TOKEN" "http://$ENDPOINT:9696/v2.0/security-groups?limit=1" > /dev/null
    end=$(perl -e "print time();")
    echo $((end-start))

    echo "one id"
    start=$(perl -e "print time();")
    time curl -s -H "X-Auth-Token: $OPENSTACK_TOKEN" "http://$ENDPOINT:9696/v2.0/security-groups?fields=id&limit=1" #> /dev/null
    end=$(perl -e "print time();")
    echo $((end-start))
}

tmp=/tmp/sg-test.$$.tmp
hit_api $tmp

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/628691
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2eb31f84c9a6c9fc6340819f756a7a82cbf395f3
Submitter: Zuul
Branch: master

commit 2eb31f84c9a6c9fc6340819f756a7a82cbf395f3
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 4 14:55:29 2019 -0700

    Fix performance regression adding rules to security groups

    Sometime between liberty and pike, adding rules to SG's got
    slow, and slower with every rule. Streamline the rule create path,
    and get close to the old performance back.

    Two performance fixes:
    1. Get rid of an n^2 duplicate check, using a hash table instead,
    on bulk creates. This is more memory intensive than the previous loop,
    but usable far past where the other becomes too slow to be useful.
    2. Use an object existence check in a few places where we do not
    want to load all of the child rules.

    Co-Authored-By: William Hager <email address hidden>
    Change-Id: I34e41a128f28211f2e7ab814a2611ce22620fcf3
    Closes-bug: 1810563

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/633143

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/633144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/633145

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/633143
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f8a192e22e39bfb64e506388f5d5c07fc30d7753
Submitter: Zuul
Branch: stable/rocky

commit f8a192e22e39bfb64e506388f5d5c07fc30d7753
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 4 14:55:29 2019 -0700

    Fix performance regression adding rules to security groups

    Sometime between liberty and pike, adding rules to SG's got
    slow, and slower with every rule. Streamline the rule create path,
    and get close to the old performance back.

    Two performance fixes:
    1. Get rid of an n^2 duplicate check, using a hash table instead,
    on bulk creates. This is more memory intensive than the previous loop,
    but usable far past where the other becomes too slow to be useful.
    2. Use an object existence check in a few places where we do not
    want to load all of the child rules.

    Also squashed in:
    Restore tenant_id check on security group rule adds to previous semantic

    We switched from swapping the tenant_id in the context to explicitly
    checking the db column. Switch back, and a test that checks for
    not breaking this rather odd behavior. At least, until we decide
    to fix it as a bug.

    Co-Authored-By: William Hager <email address hidden>
    Change-Id: I34e41a128f28211f2e7ab814a2611ce22620fcf3
    Closes-bug: 1810563
    (cherry picked from commit 2eb31f84c9a6c9fc6340819f756a7a82cbf395f3)
    (squashed patch from commit bd4c291cdfb5a75976b1f846f6d7ca88d2d154e9)

tags: added: in-stable-rocky
tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/633144
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=026f24a94df6c2131896cc69f096faa0e1ebb2f9
Submitter: Zuul
Branch: stable/queens

commit 026f24a94df6c2131896cc69f096faa0e1ebb2f9
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 4 14:55:29 2019 -0700

    Fix performance regression adding rules to security groups

    Sometime between liberty and pike, adding rules to SG's got
    slow, and slower with every rule. Streamline the rule create path,
    and get close to the old performance back.

    Two performance fixes:
    1. Get rid of an n^2 duplicate check, using a hash table instead,
    on bulk creates. This is more memory intensive than the previous loop,
    but usable far past where the other becomes too slow to be useful.
    2. Use an object existence check in a few places where we do not
    want to load all of the child rules.

    Also squashed in:
    Restore tenant_id check on security group rule adds to previous semantic

    We switched from swapping the tenant_id in the context to explicitly
    checking the db column. Switch back, and a test that checks for
    not breaking this rather odd behavior. At least, until we decide
    to fix it as a bug.

    Co-Authored-By: William Hager <email address hidden>
    Change-Id: I34e41a128f28211f2e7ab814a2611ce22620fcf3
    Closes-bug: 1810563
    (cherry picked from commit 2eb31f84c9a6c9fc6340819f756a7a82cbf395f3)
    (squashed patch from commit bd4c291cdfb5a75976b1f846f6d7ca88d2d154e9)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/633145
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f9cbd939b998f6b8d29c8d347d1b9322c699825f
Submitter: Zuul
Branch: stable/pike

commit f9cbd939b998f6b8d29c8d347d1b9322c699825f
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 4 14:55:29 2019 -0700

    Fix performance regression adding rules to security groups

    Sometime between liberty and pike, adding rules to SG's got
    slow, and slower with every rule. Streamline the rule create path,
    and get close to the old performance back.

    Two performance fixes:
    1. Get rid of an n^2 duplicate check, using a hash table instead,
    on bulk creates. This is more memory intensive than the previous loop,
    but usable far past where the other becomes too slow to be useful.
    2. Use an object existence check in a few places where we do not
    want to load all of the child rules.

    Also squashed in:
    Restore tenant_id check on security group rule adds to previous semantic

    We switched from swapping the tenant_id in the context to explicitly
    checking the db column. Switch back, and a test that checks for
    not breaking this rather odd behavior. At least, until we decide
    to fix it as a bug.

    Co-Authored-By: William Hager <email address hidden>
    Change-Id: I34e41a128f28211f2e7ab814a2611ce22620fcf3
    Closes-bug: 1810563
    (cherry picked from commit 2eb31f84c9a6c9fc6340819f756a7a82cbf395f3)
    (squashed patch from commit bd4c291cdfb5a75976b1f846f6d7ca88d2d154e9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/630401
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1d41f39ddf23b131a8960ac718ed9f8f015f3702
Submitter: Zuul
Branch: master

commit 1d41f39ddf23b131a8960ac718ed9f8f015f3702
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 11 15:14:52 2019 -0700

    Fix slow SG api calls when limiting fields

    Relating to the issue with creating rules, when reading security
    groups, it is very slow. Even when limiting to id/name, it pulls
    in all rules before returning just id/name.

    This change looks at the fields requested, and if no "synthetic"
    fields are in the list, skips initializing those.

    Co-Authored-By: Hongbin Lu<email address hidden>
    Closes-Bug: #1810563
    Change-Id: Id6870633e3943666e9b7fb900ad2d0894ee2715d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b2

This issue was fixed in the openstack/neutron 14.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b3

This issue was fixed in the openstack/neutron 14.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/637407
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1e9086f6e269f32d1cb19a0f55eb1f5dd73ca651
Submitter: Zuul
Branch: master

commit 1e9086f6e269f32d1cb19a0f55eb1f5dd73ca651
Author: Doug Wiegley <email address hidden>
Date: Sun Mar 3 10:42:16 2019 -0700

    Use dynamic lazy mode for fetching security group rules

    In conjunction with the prior fix to only get a subset of fields
    when needed, this makes the querying of non-rules SG objects
    very very fast.

    Before the two fixes, if you have about ten security groups with 2000 rules each:

    list all: 14s
    list all, just 'id' field: 14s
    list one: 0.6s
    list one, just 'id' field: 0.6s

    With just the previous partial fix:

    list all: 14s
    list all, just 'id' field: 6s
    list one: 0.6s
    list one, just 'id' field: 0.2s

    Now with this change:

    list all: 14s
    list all, just 'id' field: 0.04s
    list one: 0.6s
    list one, just 'id' field: 0.03s

    Closes-Bug: #1810563
    Change-Id: I15df276ba7dbcb3763ab20b63b26cddf2d594954

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0rc1

This issue was fixed in the openstack/neutron 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/650401

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/650517

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/650525

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/650404
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9aafd5f131aad6c885ef2eafdfedb8aea14967d4
Submitter: Zuul
Branch: stable/queens

commit 9aafd5f131aad6c885ef2eafdfedb8aea14967d4
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 11 15:14:52 2019 -0700

    Fix slow SG api calls when limiting fields

    Relating to the issue with creating rules, when reading security
    groups, it is very slow. Even when limiting to id/name, it pulls
    in all rules before returning just id/name.

    This change looks at the fields requested, and if no "synthetic"
    fields are in the list, skips initializing those.

    Co-Authored-By: Hongbin Lu<email address hidden>
    Closes-Bug: #1810563
    Change-Id: Id6870633e3943666e9b7fb900ad2d0894ee2715d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/650401
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=97bf23244de7a72cb21d7a173fc9279c06fc2bf7
Submitter: Zuul
Branch: stable/rocky

commit 97bf23244de7a72cb21d7a173fc9279c06fc2bf7
Author: Doug Wiegley <email address hidden>
Date: Fri Jan 11 15:14:52 2019 -0700

    Fix slow SG api calls when limiting fields

    Relating to the issue with creating rules, when reading security
    groups, it is very slow. Even when limiting to id/name, it pulls
    in all rules before returning just id/name.

    This change looks at the fields requested, and if no "synthetic"
    fields are in the list, skips initializing those.

    Co-Authored-By: Hongbin Lu<email address hidden>
    Closes-Bug: #1810563
    Change-Id: Id6870633e3943666e9b7fb900ad2d0894ee2715d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/rocky)

Change abandoned by Doug Wiegley (<email address hidden>) on branch: stable/rocky
Review: https://review.openstack.org/650517

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/queens)

Change abandoned by Doug Wiegley (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/650525
Reason: Requirements bump doesn't fly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.3

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/650517
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cd53b517326c15d4186f0503e608741315f5f48f
Submitter: Zuul
Branch: stable/rocky

commit cd53b517326c15d4186f0503e608741315f5f48f
Author: Doug Wiegley <email address hidden>
Date: Sun Mar 3 10:42:16 2019 -0700

    Use dynamic lazy mode for fetching security group rules

    In conjunction with the prior fix to only get a subset of fields
    when needed, this makes the querying of non-rules SG objects
    very very fast.

    Before the two fixes, if you have about ten security groups with 2000 rules each:

    list all: 14s
    list all, just 'id' field: 14s
    list one: 0.6s
    list one, just 'id' field: 0.6s

    With just the previous partial fix:

    list all: 14s
    list all, just 'id' field: 6s
    list one: 0.6s
    list one, just 'id' field: 0.2s

    Now with this change:

    list all: 14s
    list all, just 'id' field: 0.04s
    list one: 0.6s
    list one, just 'id' field: 0.03s

    Closes-Bug: #1810563
    Change-Id: I15df276ba7dbcb3763ab20b63b26cddf2d594954
    (cherry picked from commit 1e9086f6e269f32d1cb19a0f55eb1f5dd73ca651)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/650525
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d075028146a6fcec57e05fc4952682bb2ad49b58
Submitter: Zuul
Branch: stable/queens

commit d075028146a6fcec57e05fc4952682bb2ad49b58
Author: Doug Wiegley <email address hidden>
Date: Sun Mar 3 10:42:16 2019 -0700

    Use dynamic lazy mode for fetching security group rules

    In conjunction with the prior fix to only get a subset of fields
    when needed, this makes the querying of non-rules SG objects
    very very fast.

    Before the two fixes, if you have about ten security groups with 2000 rules each:

    list all: 14s
    list all, just 'id' field: 14s
    list one: 0.6s
    list one, just 'id' field: 0.6s

    With just the previous partial fix:

    list all: 14s
    list all, just 'id' field: 6s
    list one: 0.6s
    list one, just 'id' field: 0.2s

    Now with this change:

    list all: 14s
    list all, just 'id' field: 0.04s
    list one: 0.6s
    list one, just 'id' field: 0.03s

    Closes-Bug: #1810563
    Change-Id: I15df276ba7dbcb3763ab20b63b26cddf2d594954
    (cherry picked from commit 1e9086f6e269f32d1cb19a0f55eb1f5dd73ca651)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.4

This issue was fixed in the openstack/neutron 13.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.1.0

This issue was fixed in the openstack/neutron 12.1.0 release.

Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :

This is still very much an issue. When we run the following command in one of our larger Rocky environments on a group with only 6 rules. It takes almost a full minute for it to return.

> time openstack security group rule list <id>
> ....
> real 0m54.573s
> user 0m2.893s
> sys 0m0.523s

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.