Suboptimal KeepAliveTimeout terminates keystoneauth1.session.Session unnecessarily and leads to "Remote end closed connection without response"

Bug #1947010 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charm Helpers
Undecided
Unassigned
OpenStack Base Layer
Undecided
Unassigned
OpenStack Dashboard Charm
Undecided
Unassigned
OpenStack Keystone Charm
High
Unassigned

Bug Description

Charmed OpenStack doesn't set an explicit KeepAliveTimeout for OpenStack API and dashboard services, and the default value of KeepAliveTimeout in Apache2 is used which is 5 seconds.

When clients (including OpenStack services themselves talking to other services) open a session using keystoneauth1.session.Session. It tries to use a connection pool and respect HTTP Keep alive timeout. However, due to the suboptimal KeepAliveTimeout on the server side, clients have to re-establish TCP connections whenever an interval of two requests is more than 5 seconds.

I've seen a lot of not-normally terminated sessions ("CD" status) at haproxy in front of apache2. And it's because of the connection reset from the client side due to an expiration of Keep alive timeout.

> CD The client unexpectedly aborted during data transfer. This can be
> caused by a browser crash, by an intermediate equipment between the
> client and haproxy which decided to actively break the connection,
> by network routing issues between the client and haproxy, or by a
> keep-alive session between the server and the client terminated first
> by the client.

By increasing the KeepAliveTimeout value, we can achieve:
- far less frequent not-normally terminated sessions (haproxy logs will be a lot cleaner)
- TCP connections are properly reused in the session, which keystoneauth1.session.Session was designed for
- avoid intermittent but fatal errors of "Remote end closed connection without response" from occurring

Actual issues in the field:
https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1944467

Revision history for this message
Nobuto Murata (nobuto) wrote :

1st example: if a second request is 6 seconds after the first request (> KeepAliveTimeout=5), tcp connection is not reused under keystoneauth1.session.Session.

The line to highlight is "urllib3.connectionpool:Resetting dropped connection" which is resetting the TCP connection and to re-establish another TCP connection.

https://docs.openstack.org/python-novaclient/latest/user/python-api.html
====
from keystoneauth1.identity import v3
from keystoneauth1 import session
from novaclient import client

import logging

logging.basicConfig(level=logging.DEBUG)

from time import sleep

auth = v3.Password(auth_url="https://192.168.151.112:5000/v3",
                   username="admin", password="MY_PASSWORD",
                   project_name="admin",
                   user_domain_name="admin_domain",
                   project_domain_name="admin_domain")

sess = session.Session(auth=auth,
    verify="/home/ubuntu/snap/openstackclients/common/root-ca.crt")
nova = client.Client(2, session=sess)

nova.servers.list()
sleep(6)
nova.flavors.list()
====

>>> nova.flavors.list()
DEBUG:keystoneauth.session:REQ: curl -g -i --cacert "/home/ubuntu/snap/openstackclients/common/root-ca.crt" -X GET https://192.168.151.115:8774/v2.1/flavors/detail -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA256}5849e99d70d26ae11c4c62e49401059c89a66c6a1172b9bd867e1b36477a07f8"
DEBUG:urllib3.connectionpool:Resetting dropped connection: 192.168.151.115
DEBUG:urllib3.connectionpool:https://192.168.151.115:8774 "GET /v2.1/flavors/detail HTTP/1.1" 200 478
DEBUG:keystoneauth.session:RESP: [200] Connection: Keep-Alive Content-Length: 478 Content-Type: application/json Date: Wed, 13 Oct 2021 15:41:02 GMT Keep-Alive: timeout=5, max=100 OpenStack-API-Version: compute 2.1 Server: Apache/2.4.41 (Ubuntu) Vary: OpenStack-API-Version,X-OpenStack-Nova-API-Version X-OpenStack-Nova-API-Version: 2.1 x-compute-request-id: req-f299ee15-cad8-429a-954c-a1b79047e35f x-openstack-request-id: req-f299ee15-cad8-429a-954c-a1b79047e35f

...

description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (3.6 KiB)

2nd example: when polling a status with 5 seconds interval (= KeepAliveTimeout=5), the API request eventually fails with a fatal error. It's known to happen when the interval is exactly the same as KeepAliveTimeout discussed in multiple places, e.g.
https://github.com/psf/requests/issues/4664
https://bugs.python.org/issue41345

https://docs.openstack.org/python-keystoneclient/latest/using-sessions.html
====
from keystoneauth1.identity import v3
from keystoneauth1 import session
from keystoneclient.v3 import client

import logging

logging.basicConfig(level=logging.DEBUG)

from time import sleep

auth = v3.Password(auth_url="https://192.168.151.112:5000/v3",
                   username="admin", password="MY_PASSWORD",
                   project_name="admin",
                   user_domain_name="admin_domain",
                   project_domain_name="admin_domain")

sess = session.Session(auth=auth,
    verify="/home/ubuntu/snap/openstackclients/common/root-ca.crt")
keystone = client.Client(session=sess)

while True:
    keystone.projects.list()
    sleep(5)
====

...

DEBUG:keystoneauth.session:GET call to identity for https://192.168.151.112:35357/v3/projects used request id req-37b4e617-3272-440e-90fe-bbfcebe0c865
[<Project description=Created by Juju, domain_id=default, enabled=True, id=32035470a7104dc28e51d52ccd4968c3, is_domain=False, links={'self': 'https://192.168.151.112:5000/v3/projects/32035470a7104dc28e51d52ccd4968c3'}, name=services, options={}, parent_id=default, tags=[]>, <Project description=Created by Juju, domain_id=fcc7fc21bd35421d9ceb4a9c0890f527, enabled=True, id=33d9a30eec2a4902b220ea0798746be9, is_domain=False, links={'self': 'https://192.168.151.112:5000/v3/projects/33d9a30eec2a4902b220ea0798746be9'}, name=services, options={}, parent_id=fcc7fc21bd35421d9ceb4a9c0890f527, tags=[]>, <Project description=Created by Juju, domain_id=29e509050afa4cc2b235eabb9e86a8ff, enabled=True, id=75f15e43071a49f488291b95b6245711, is_domain=False, links={'self': 'https://192.168.151.112:5000/v3/projects/75f15e43071a49f488291b95b6245711'}, name=admin, options={}, parent_id=29e509050afa4cc2b235eabb9e86a8ff, tags=[]>, <Project description=Bootstrap project for initializing the cloud., domain_id=default, enabled=True, id=acd0dacb48364de386d8a5faea47b6cf, is_domain=False, links={'self': 'https://192.168.151.112:5000/v3/projects/acd0dacb48364de386d8a5faea47b6cf'}, name=admin, options={}, parent_id=default, tags=[]>]
DEBUG:keystoneauth.session:REQ: curl -g -i --cacert "/home/ubuntu/snap/openstackclients/common/root-ca.crt" -X GET https://192.168.151.112:35357/v3/projects? -H "Accept: application/json" -H "User-Agent: python-keystoneclient" -H "X-Auth-Token: {SHA256}dff1fb0579fd35bfe46ef6574aadfda8993582d4ea2514e13fed16c181dec24d"
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
 ...

Read more...

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-keystone (master)
Changed in charm-keystone:
status: New → In Progress
Changed in charm-openstack-dashboard:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-openstack-dashboard (master)
Revision history for this message
Nobuto Murata (nobuto) wrote :
Changed in layer-openstack:
status: New → In Progress
Changed in charm-helpers:
status: New → In Progress
Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribed ~field-medium.

Changed in charm-keystone:
status: In Progress → Fix Committed
importance: Undecided → High
status: Fix Committed → In Progress
Changed in layer-openstack:
status: In Progress → Fix Committed
Revision history for this message
Nobuto Murata (nobuto) wrote :

Just to be clear, this affects all of the OpenStack client libraries not just keystone although the underlying module name is keystoneauth1.session.Session.

As you can see in the comment #1, HTTP Keep-alive Timeout matters to novaclient too for example with Nova specific operations such as nova.servers.list() or nova.flavors.list(). Because those calls are using keystoneauth1.session.Session underneath when it's invoked as nova = client.Client(2, session=sess).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (master)

Reviewed: https://review.opendev.org/c/openstack/charm-keystone/+/813921
Committed: https://opendev.org/openstack/charm-keystone/commit/44bf92f6bf0c1396222433a7fd2717960987f1ce
Submitter: "Zuul (22348)"
Branch: master

commit 44bf92f6bf0c1396222433a7fd2717960987f1ce
Author: Nobuto Murata <email address hidden>
Date: Thu Oct 14 09:51:52 2021 +0900

    Ease KeepAliveTimeout in line with keystoneauth1.session.Session

    Apache2's default value for KeepAliveTimeout is 5 seconds, which is okay
    for general web-page serving use cases. However, sessions and connection
    pools created by keystoneauth1.session.Session can be terminated
    unnecessarily during multiple API calls in a session due to the short
    KeepAliveTimeout.

    Let's ease KeepAliveTimeout to 75 seconds, which is fairly standard for
    API services behind a reverse proxy since it's the default value of
    nginx.

    Closes-Bug: #1947010
    Change-Id: Iff24f0f4b35fcc239abc14f37a76dcad8380d785

Changed in charm-keystone:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-openstack-dashboard (master)

Reviewed: https://review.opendev.org/c/openstack/charm-openstack-dashboard/+/813923
Committed: https://opendev.org/openstack/charm-openstack-dashboard/commit/dbc4077ba5abeaced91d99f55ef876584f55ed69
Submitter: "Zuul (22348)"
Branch: master

commit dbc4077ba5abeaced91d99f55ef876584f55ed69
Author: Nobuto Murata <email address hidden>
Date: Thu Oct 14 09:54:46 2021 +0900

    Ease KeepAliveTimeout

    Apache2's default value for KeepAliveTimeout is 5 seconds, which is okay
    for general web-page serving use cases. However, connections from a web
    browser to Horizon application can be terminated unnecessarily during
    multiple requests in a session due to the short KeepAliveTimeout.

    Let's ease KeepAliveTimeout to 75 seconds, which is fairly standard for
    services behind a reverse proxy since it's the default value of nginx.

    Closes-Bug: #1947010
    Change-Id: Iff9357e5efb7937927a8d0a6de072d4afaa98906

Changed in charm-openstack-dashboard:
status: In Progress → Fix Committed
Felipe Reyes (freyes)
Changed in layer-openstack:
milestone: none → 22.04
Changed in charm-openstack-dashboard:
milestone: none → 22.04
Changed in charm-keystone:
milestone: none → 22.04
Nobuto Murata (nobuto)
Changed in charm-helpers:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.