NSX v3 excessive request back pressure on forced revalidate

Bug #1541591 reported by Boden R
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vmware-nsx
Fix Released
Undecided
Boden R

Bug Description

With the current NSX v3 clustered client logic, endpoint selection will force a revalidate of endpoint states in cases where all endpoints are down. While this can be ideal in lower throughput scenarios where endpoints state is fluctuating, it's less than optimal in high request throughput scenarios. In these scenarios we get back pressure caused by cascading forced revalidation.

Boden R (boden)
Changed in vmware-nsx:
assignee: nobody → Boden R (boden)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to vmware-nsx (master)

Fix proposed to branch: master
Review: https://review.openstack.org/275938

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to vmware-nsx (master)

Reviewed: https://review.openstack.org/275938
Committed: https://git.openstack.org/cgit/openstack/vmware-nsx/commit/?id=e7acdfe91ae1e539fa89de4e161d06dde5ede427
Submitter: Jenkins
Branch: master

commit e7acdfe91ae1e539fa89de4e161d06dde5ede427
Author: Boden R <email address hidden>
Date: Wed Feb 3 14:39:27 2016 -0700

    NSX-v3 update endpoint state only on timeout

    This patch removes the NSX v3 client cluster logic that
    forces a revalidate of all endpoints when endpoint
    selection only finds DOWN endpoints. The revalidate
    call can cause cascading backpressure under certain
    circumstances.

    Now DOWN endpoints are only returned to UP as part
    of the endpoint keepalive ping that is controlled via
    conn_idle_timeout config property. Thus, the default
    conn_idle_timeout is also decreased to 10s ensuring
    endpoint revalidation occurs (by default) on a fequent
    basis.

    backport: liberty

    Change-Id: I5423bce793892dd864353a23ca7c288b846a1ab6
    Closes-Bug: #1541591

Changed in vmware-nsx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to vmware-nsx (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/277913

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to vmware-nsx (stable/liberty)

Reviewed: https://review.openstack.org/277913
Committed: https://git.openstack.org/cgit/openstack/vmware-nsx/commit/?id=d4303335b2b1bd586ca227459fb8fa64b54482cb
Submitter: Jenkins
Branch: stable/liberty

commit d4303335b2b1bd586ca227459fb8fa64b54482cb
Author: Boden R <email address hidden>
Date: Wed Feb 3 14:39:27 2016 -0700

    NSX-v3 update endpoint state only on timeout

    This patch removes the NSX v3 client cluster logic that
    forces a revalidate of all endpoints when endpoint
    selection only finds DOWN endpoints. The revalidate
    call can cause cascading backpressure under certain
    circumstances.

    Now DOWN endpoints are only returned to UP as part
    of the endpoint keepalive ping that is controlled via
    conn_idle_timeout config property. Thus, the default
    conn_idle_timeout is also decreased to 10s ensuring
    endpoint revalidation occurs (by default) on a fequent
    basis.

    backport: liberty

    Closes-Bug: #1541591
    (cherry picked from commit e7acdfe91ae1e539fa89de4e161d06dde5ede427)

    Conflicts:
     vmware_nsx/nsxlib/v3/cluster.py
    Change-Id: I5423bce793892dd864353a23ca7c288b846a1ab6

tags: added: in-stable-liberty
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.