novaclient can get hung when used for extended periods

Bug #1323862 reported by Robert Collins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-keystoneclient
Fix Released
Medium
Ian Cordasco
python-novaclient
Fix Released
High
Ian Cordasco

Bug Description

We see a regular situation with nodepool and the HP1 region of tripleo-test-cloud where nodepool has a novaclient session open but no requests get submitted. Debugging has uncovered that when this happens there is a hung tcp connection:
 - novaclient has sent a request
 - the packet with the request got ACKd
 - no response has been received
 - tcp keepalive is disabled

if the far end has reset its connection for some reason (e.g. it got a failed packet delivery onthe response, or driver issues, or $whatever) this will never recover. We have fixed it by manually triggering a RST to the stuck connection, but it would be a lot better if persistent connections used tcp keepalive.

Revision history for this message
Joe Gordon (jogo) wrote :

This sounds like it may be a bug in one of the libraries we are using, or perhaps how we are using the libraries. Perhaps related to the requests library.

Revision history for this message
Ian Cordasco (icordasc) wrote :

If this is tracked to requests, feel free to ping me in #python-requests on Freenode. (You can also ping Lukasa/Lukasa_col if I'm not there.)

Revision history for this message
Ian Cordasco (icordasc) wrote :

Also, I should have mentioned that keep-alive is provided by requests (assuming this has something to do with requests) by simply setting the header appropriately. (Sorry for a second comment so soon)

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 1323862] Re: novaclient can get hung when used for extended periods

For clarity - this is *TCP* keepalive, not *HTTP* keepalive. That
said, I believe requests has been implicated by folk digging into this
in the past.

-Rob

Revision history for this message
Ian Cordasco (icordasc) wrote :

In looking into this, we would be relying on urllib3 to be able to set this connection (socket-level) option. Discussion about being able to set socket level options like this is happening on this issue: https://github.com/shazow/urllib3/issues/378

Revision history for this message
Ian Cordasco (icordasc) wrote :

So urllib3 now has an API for setting these values. requests simply needs to update its copy and cut a release. We'll need to subclass HTTPAdapter and replace HTTPAdapter here: https://github.com/openstack/python-novaclient/blob/master/novaclient/client.py#L56 with the subclass so that we can pass the extra options to urllib3's PoolManager.

Changed in python-novaclient:
assignee: nobody → Ian Cordasco (icordasc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-novaclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/120571

Changed in python-novaclient:
status: New → In Progress
Ian Cordasco (icordasc)
Changed in python-keystoneclient:
status: New → In Progress
assignee: nobody → Ian Cordasco (icordasc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-keystoneclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/147707

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-novaclient (master)

Reviewed: https://review.openstack.org/120571
Committed: https://git.openstack.org/cgit/openstack/python-novaclient/commit/?id=7713fa0b18932a00231818ebdb239493d3b9c714
Submitter: Jenkins
Branch: master

commit 7713fa0b18932a00231818ebdb239493d3b9c714
Author: Ian Cordasco <email address hidden>
Date: Wed Sep 10 15:13:20 2014 -0500

    Use TCP Keep-Alive on the socket level

    There is not a way to pass the socket options to the HTTPAdapter upon
    creation so we have to sub-class it and override the init_poolmanager
    method. This also requires at least python-requests 2.4.0 but that
    has 2 severe bugs that were fixed in 2.4.1. If we try to fix this
    without a hard lower limit, we will not be able to properly set these
    options on the socket at creation time.

    Change-Id: I06e0d2c67d3197607e5f23f623c8fca69e1b23d7
    Closes-bug: 1323862

Changed in python-novaclient:
status: In Progress → Fix Committed
melanie witt (melwitt)
Changed in python-novaclient:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-keystoneclient (master)

Reviewed: https://review.openstack.org/147707
Committed: https://git.openstack.org/cgit/openstack/python-keystoneclient/commit/?id=fe9692ea6be848b4f2d99daffd598a9fbfe79f42
Submitter: Jenkins
Branch: master

commit fe9692ea6be848b4f2d99daffd598a9fbfe79f42
Author: Ian Cordasco <email address hidden>
Date: Thu Jan 15 18:21:17 2015 -0600

    Configure TCP Keep-Alive for certain Sessions

    If the user creates a keystoneclient.session.Session without passing a
    custom session, we will enable TCP Keep-Alive for the requests session
    used by keystoneclient's Session.

    novaclient and other clients can experience hung TCP connections. Most
    clients use keystoneclient's session and will need this merged here
    before they can make use of it in their projects.

    Change-Id: Ib70a8b3270d2492596b9fb8981b8584b85567a9c
    Closes-bug: 1323862

Changed in python-keystoneclient:
status: In Progress → Fix Committed
Changed in python-keystoneclient:
importance: Undecided → Medium
Changed in python-keystoneclient:
milestone: none → 1.1.0
Changed in python-keystoneclient:
status: Fix Committed → Fix Released
Michael Still (mikal)
Changed in python-novaclient:
milestone: none → 2.21.0
Michael Still (mikal)
Changed in python-novaclient:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-keystoneclient (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/204741

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-novaclient (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/204745

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-keystoneclient (master)

Reviewed: https://review.openstack.org/204741
Committed: https://git.openstack.org/cgit/openstack/python-keystoneclient/commit/?id=c6b14f94c5021452796d7bd151c2c98ae983afdd
Submitter: Jenkins
Branch: master

commit c6b14f94c5021452796d7bd151c2c98ae983afdd
Author: Ian Cordasco <email address hidden>
Date: Wed Jul 22 14:53:32 2015 -0500

    Set reasonable defaults for TCP Keep-Alive

    Previously we simply turned on TCP Keep-Alive which relied on
    per-distribution, per-operating system defaults for keep-alive options.
    Here we set reasonable defaults since long running processes can get
    stuck for hours on end by using system defaults. This also adds comments
    around the options to explain why they're being set.

    Closes-bug: 1477275
    Related-bug: 1323862
    Change-Id: Ibd53ae2d4d2455db0ebc9951e5c764befc57850f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-keystoneclient (feature/keystoneauth_integration)

Related fix proposed to branch: feature/keystoneauth_integration
Review: https://review.openstack.org/207267

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-novaclient (master)

Reviewed: https://review.openstack.org/204745
Committed: https://git.openstack.org/cgit/openstack/python-novaclient/commit/?id=c69c38c58f6789cdaa89d51b696efc0c7b5f825d
Submitter: Jenkins
Branch: master

commit c69c38c58f6789cdaa89d51b696efc0c7b5f825d
Author: Ian Cordasco <email address hidden>
Date: Wed Jul 22 15:13:01 2015 -0500

    Use keystoneclient's TCPKeepAliveAdapter

    When novaclient isn't using a session from keystoneclient, it needs to
    set reasonable TCP Keep-Alive values otherwise the operating system
    defaults may cause the client to hang for hours before a connection will
    time out. Using keystoneclient's adpater (which sets good defaults) will
    allow us to not have to maintain this adapter here and to benefit from
    their defaults.

    Closes-bug: 1477275
    Related-bug: 1323862
    Depends-On: Ibd53ae2d4d2455db0ebc9951e5c764befc57850f
    Change-Id: I1924bd96eb1a4bac5d57a5cc5d5461acb3f7f5ac

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-keystoneclient (feature/keystoneauth_integration)
Download full text (22.7 KiB)

Reviewed: https://review.openstack.org/207267
Committed: https://git.openstack.org/cgit/openstack/python-keystoneclient/commit/?id=4e498a54d0034b2ce5c87130f080ff580d241600
Submitter: Jenkins
Branch: feature/keystoneauth_integration

commit 51d9d123a8df79f6b84d196d41f1008f8d6033d4
Author: Brant Knudson <email address hidden>
Date: Wed Aug 5 12:28:30 2015 -0500

    Deprecate openstack.common.apiclient

    Deprecate the apiclient from oslo-incubator so we can get rid of
    it.

    bp deprecations

    Change-Id: I1c761933816da03b6c625f14d0aac43f206e88d7

commit 16e834dd4597314d79cf4fb0bb98449e6552f804
Author: Brant Knudson <email address hidden>
Date: Wed Aug 5 11:17:34 2015 -0500

    Move apiclient.base.Resource into keystoneclient

    keystoneclient is using apiclient.base and in order to properly
    deprecate and eventually get rid of apiclient we need to move the
    symbols that keystoneclient uses out of apiclient.

    This change moves apiclient.base.Resource into keystoneclient.base
    by merging apiclient.base.Resource into the existing
    keystoneclient.base.Resource. apiclient.base.Resource is now
    renaming keystoneclient.base.Resource for backwards-compatibility.

    Change-Id: Id479711b7c9437aaf171def6976aab8b303ec56d

commit 26534dadb1d0be00b87b632a038839ab1c18cfe4
Author: Brant Knudson <email address hidden>
Date: Tue Aug 4 19:57:26 2015 -0500

    oslo-incubator apiclient.exceptions to keystoneclient.exceptions

    Applications are using exceptions out of
    keystoneclient.openstack.common.apiclient.exceptions so it's part
    of the public interface. But since it's from oslo-incubator it's
    not considered stable. Since keystoneclient is all stable this
    creates bad situation.

    With this change, all the symbols out of apiclient.exceptions are
    moved into keystoneclient.exceptions rather than the other way
    around (keystoneclient.exceptions used to re-export all of
    apiclient.exceptions). Now we're in control of the
    apiclient.exceptions and keystoneclient.exceptions that
    applications are using.

    Closes-Bug: 1481806
    Change-Id: Ib3afa86b9d276f6a45d1ecd6f9157ee02ec8570d

commit eaa7ddd7443ca166f6646e808dcad959811d158b
Author: Brant Knudson <email address hidden>
Date: Sun Jul 26 06:53:58 2015 -0500

    Proper deprecation for HTTPClient session and adapter properties

    HTTPClient's forwarded session and adapter properties weren't
    properly deprecated since the deprecations was only mentioned in
    the docstring. Proper deprecation requires use of warnings/
    debtcollector and documentation.

    bp deprecations

    Change-Id: Iea76d7bbc3bdeb13f7fdb097f13e007b4dd85c8d

commit 0c2fef51d2b90df088d30e9b6c5079caae7c6578
Author: Brant Knudson <email address hidden>
Date: Fri Jul 24 15:52:57 2015 -0500

    Proper deprecation for HTTPClient.request methods

    HTTPClient.request and related methods weren't properly
    deprecated since they were only mentioned in the docstrings.
    Proper deprecation requires use of warnings/debtcollector and
    documentation.

    Also, fixed places where the deprecated request m...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.