Conntrack entry removal can take a long time on large deployments

Bug #1745468 reported by Brian Haley
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Brian Haley

Bug Description

On a large deployment of about 1000 instances, instance deletion (neutron port deletion) or security group rule changes can take a really long time. We've actually seen it take hours in some instances.

While changing to netlink-lib for the IP Conntrack manager will help, https://review.openstack.org/#/c/470912/ it could still lead to long delays at higher instance counts. Also, that change might not be easily back-portable to older releases. Doing the conntrack entry deletion in a thread, which has been proposed before, could help alleviate this a bit by letting the caller (OVS agent) get back to other work quicker.

Also, while the netlink-lib change above is better at only issuing calls for entries it finds, the current code doesn't do that, it could call 'conntrack -D' with arguments for nothing. If we first checked the table for given IPs it might reduce the time it takes for cleanup.

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/538042

Miguel Lavalle (minsel)
Changed in neutron:
milestone: none → queens-rc1
Miguel Lavalle (minsel)
Changed in neutron:
milestone: queens-rc1 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Brian Haley (<email address hidden>) on branch: master
Review: https://review.openstack.org/538042
Reason: I don't think this is worth the trouble given the conntrack-lib change.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/537654
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=65a81623fc0377b26d2d5800607f7c3acc08c45a
Submitter: Zuul
Branch: master

commit 65a81623fc0377b26d2d5800607f7c3acc08c45a
Author: Brian Haley <email address hidden>
Date: Wed Jan 24 15:55:56 2018 -0500

    Process conntrack updates in worker threads

    With a large number of instances and/or security group rules,
    conntrack updates when ports are removed or rules are changed
    can take a long time to process. By enqueuing these to a set
    or worker threads, the agent can continue with other work while
    they are processed in the background.

    This is a change in behavior in the agent since it could
    program a new set of security group rules before all existing
    conntrack entries are deleted, but since the iptables or OVSfw
    NAT rules will have been removed, it should not pose a
    security issue.

    Change-Id: Ibf858c7fdf7a822a30e4a0c4722d70fd272741b6
    Closes-bug: #1745468

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/545612

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/545612
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0dbd35df1bdaea7dec97fd976b6990f4b79a6b77
Submitter: Zuul
Branch: stable/queens

commit 0dbd35df1bdaea7dec97fd976b6990f4b79a6b77
Author: Brian Haley <email address hidden>
Date: Wed Jan 24 15:55:56 2018 -0500

    Process conntrack updates in worker threads

    With a large number of instances and/or security group rules,
    conntrack updates when ports are removed or rules are changed
    can take a long time to process. By enqueuing these to a set
    or worker threads, the agent can continue with other work while
    they are processed in the background.

    This is a change in behavior in the agent since it could
    program a new set of security group rules before all existing
    conntrack entries are deleted, but since the iptables or OVSfw
    NAT rules will have been removed, it should not pose a
    security issue.

    Change-Id: Ibf858c7fdf7a822a30e4a0c4722d70fd272741b6
    Closes-bug: #1745468
    (cherry picked from commit 65a81623fc0377b26d2d5800607f7c3acc08c45a)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.1

This issue was fixed in the openstack/neutron 12.0.1 release.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Does this patch have a backport potential to Pike.

Revision history for this message
Piotr Misiak (piotr-misiak) wrote :

Looks like the patch should be directly applicable to Pike.

Please keep in mind that this patch introduces a new bug: https://bugs.launchpad.net/neutron/+bug/1750777 so it should be backported to Pike together with a fix to 1750777 bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b1

This issue was fixed in the openstack/neutron 13.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.