driver-agent's driver_get method may fail with a timeout when the DB is really busy

Bug #2032890 reported by Gregory Thiemonge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Fix Released
Medium
Unassigned

Bug Description

3rd party providers sometimes need to fetch Octavia objects through the dedicated socket.

But when the DB server is really loaded, it may take too much time to get the objects and their children from the DB (there are some probably unnecessary recursive conversions from the DB objects to dicts)

(it happened during testing of the ovn-octavia-provider while simulating latency between the Octavia nodes and the DB servers)

In those cases, a socket.timeout exception is raised in the Octavia API service

Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils [None req-0f70cc56-5a26-4bf3-be46-6bab57be934c demo admin] Provider 'ovn' raised a driver error: An unknown driver error occurred.: octavia_lib.api.drivers.exceptions.DriverErr
or: An unknown driver error occurred.
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils Traceback (most recent call last):
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 157, in _get_resource
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils return self._send(self.get_socket, {constants.OBJECT: resource,
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 97, in _send
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils response = self._recv(sock)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 61, in _recv
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils char = sock.recv(1)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils socket.timeout: timed out
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils The above exception was the direct cause of the following exception:
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils Traceback (most recent call last):
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/octavia/octavia/api/drivers/utils.py", line 52, in call_provider
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils return driver_method(*args, **kwargs)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/driver.py", line 275, in member_create
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils subnet_id, subnet_cidr = self._ovn_helper._get_subnet_from_pool(
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/helper.py", line 570, in _get_subnet_from_pool
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils pool = self._octavia_driver_lib.get_pool(pool_id)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 204, in get_pool
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils data = self._get_resource(constants.POOLS, pool_id)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 162, in _get_resource
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils raise driver_exceptions.DriverError() from e
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils octavia_lib.api.drivers.exceptions.DriverError: An unknown driver error occurred.
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR octavia.api.drivers.utils
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: ERROR wsme.api [None req-0f70cc56-5a26-4bf3-be46-6bab57be934c demo admin] Server-side error: "Provider 'ovn' reports error: An unknown driver error occurred.". Detail:
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: Traceback (most recent call last):
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 157, in _get_resource
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: return self._send(self.get_socket, {constants.OBJECT: resource,
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 97, in _send
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: response = self._recv(sock)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 61, in _recv
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: char = sock.recv(1)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: socket.timeout: timed out
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: The above exception was the direct cause of the following exception:
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: Traceback (most recent call last):
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia/octavia/api/drivers/utils.py", line 52, in call_provider
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: return driver_method(*args, **kwargs)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/driver.py", line 275, in member_create
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: subnet_id, subnet_cidr = self._ovn_helper._get_subnet_from_pool(
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/ovn-octavia-provider/ovn_octavia_provider/helper.py", line 570, in _get_subnet_from_pool
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: pool = self._octavia_driver_lib.get_pool(pool_id)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 204, in get_pool
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: data = self._get_resource(constants.POOLS, pool_id)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia-lib/octavia_lib/api/drivers/driver_lib.py", line 162, in _get_resource
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: raise driver_exceptions.DriverError() from e
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: octavia_lib.api.drivers.exceptions.DriverError: An unknown driver error occurred.
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: During handling of the above exception, another exception occurred:
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: Traceback (most recent call last):
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/usr/local/lib/python3.9/site-packages/wsmeext/pecan.py", line 82, in callfunction
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: result = f(self, *args, **kwargs)
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia/octavia/api/v2/controllers/member.py", line 191, in post
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: lock_session.rollback()
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/usr/local/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: self.force_reraise()
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/usr/local/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: raise self.value
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia/octavia/api/v2/controllers/member.py", line 185, in post
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: driver_utils.call_provider(
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: File "/opt/stack/octavia/octavia/api/drivers/utils.py", line 56, in call_provider
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: raise exceptions.ProviderDriverError(prov=provider,
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: octavia.common.exceptions.ProviderDriverError: Provider 'ovn' reports error: An unknown driver error occurred.
Aug 24 02:09:10 gthiemon-devstack <email address hidden>[518382]: : octavia.common.exceptions.ProviderDriverError: Provider 'ovn' reports error: An unknown driver error occurred.

And also in the driver-agent:

Aug 24 02:09:11 gthiemon-devstack octavia-driver-agent[518773]: ERROR octavia.api.drivers.driver_agent.driver_listener [-] Error while sending data.: BrokenPipeError: [Errno 32] Broken pipe
Aug 24 02:09:11 gthiemon-devstack octavia-driver-agent[518773]: ERROR octavia.api.drivers.driver_agent.driver_listener Traceback (most recent call last):
Aug 24 02:09:11 gthiemon-devstack octavia-driver-agent[518773]: ERROR octavia.api.drivers.driver_agent.driver_listener File "/opt/stack/octavia/octavia/api/drivers/driver_agent/driver_listener.py", line 114, in handle
Aug 24 02:09:11 gthiemon-devstack octavia-driver-agent[518773]: ERROR octavia.api.drivers.driver_agent.driver_listener self.request.send(len_str)
Aug 24 02:09:11 gthiemon-devstack octavia-driver-agent[518773]: ERROR octavia.api.drivers.driver_agent.driver_listener BrokenPipeError: [Errno 32] Broken pipe
Aug 24 02:09:11 gthiemon-devstack octavia-driver-agent[518773]: ERROR octavia.api.drivers.driver_agent.driver_listener

Changed in octavia:
importance: Undecided → Medium
status: New → Confirmed
Changed in octavia:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (master)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/892417
Committed: https://opendev.org/openstack/octavia-lib/commit/2a84a218ef1f683d5584784b7d9607e7453b0729
Submitter: "Zuul (22348)"
Branch: master

commit 2a84a218ef1f683d5584784b7d9607e7453b0729
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451

Changed in octavia:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia-lib (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/octavia-lib/+/896055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia-lib (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/octavia-lib/+/896056

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia-lib (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/octavia-lib/+/896057

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia-lib (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/octavia-lib/+/896058

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia-lib (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/octavia-lib/+/896059

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia-lib (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/octavia-lib/+/896060

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/896055
Committed: https://opendev.org/openstack/octavia-lib/commit/f1142a99a982cc6919143734680b3b0736b1c0b1
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit f1142a99a982cc6919143734680b3b0736b1c0b1
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451
    (cherry picked from commit 2a84a218ef1f683d5584784b7d9607e7453b0729)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/896056
Committed: https://opendev.org/openstack/octavia-lib/commit/5ec74a4dbeda9b65fbe3cdae078fca713a1dad74
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 5ec74a4dbeda9b65fbe3cdae078fca713a1dad74
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451
    (cherry picked from commit 2a84a218ef1f683d5584784b7d9607e7453b0729)
    (cherry picked from commit f1142a99a982cc6919143734680b3b0736b1c0b1)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/896057
Committed: https://opendev.org/openstack/octavia-lib/commit/3f764c5d2b17568aebc5a551de08f9d489d62519
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 3f764c5d2b17568aebc5a551de08f9d489d62519
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451
    (cherry picked from commit 2a84a218ef1f683d5584784b7d9607e7453b0729)
    (cherry picked from commit f1142a99a982cc6919143734680b3b0736b1c0b1)
    (cherry picked from commit 5ec74a4dbeda9b65fbe3cdae078fca713a1dad74)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/896058
Committed: https://opendev.org/openstack/octavia-lib/commit/450c3b9f4739d80f859fa7b7ecc64fda35cb983d
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 450c3b9f4739d80f859fa7b7ecc64fda35cb983d
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451
    (cherry picked from commit 2a84a218ef1f683d5584784b7d9607e7453b0729)
    (cherry picked from commit f1142a99a982cc6919143734680b3b0736b1c0b1)
    (cherry picked from commit 5ec74a4dbeda9b65fbe3cdae078fca713a1dad74)
    (cherry picked from commit 3f764c5d2b17568aebc5a551de08f9d489d62519)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/896059
Committed: https://opendev.org/openstack/octavia-lib/commit/7c2d5cdba50763c2f37bf173b9113f36dedef4ad
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 7c2d5cdba50763c2f37bf173b9113f36dedef4ad
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451
    (cherry picked from commit 2a84a218ef1f683d5584784b7d9607e7453b0729)
    (cherry picked from commit f1142a99a982cc6919143734680b3b0736b1c0b1)
    (cherry picked from commit 5ec74a4dbeda9b65fbe3cdae078fca713a1dad74)
    (cherry picked from commit 3f764c5d2b17568aebc5a551de08f9d489d62519)
    (cherry picked from commit 450c3b9f4739d80f859fa7b7ecc64fda35cb983d)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia-lib (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/octavia-lib/+/896060
Committed: https://opendev.org/openstack/octavia-lib/commit/88363159796cca0e565c91cb2b421066207e2b1c
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 88363159796cca0e565c91cb2b421066207e2b1c
Author: Michael Johnson <email address hidden>
Date: Wed Aug 23 00:13:39 2023 +0000

    Fix a possible receive timeout

    This patch fixes a possible receive timeout caused by a slow response from the
    driver agent. For example if the database is very slow.

    Closes-Bug: #2032890

    Change-Id: I9079030a5fef9dc44da242adab3c568666777451
    (cherry picked from commit 2a84a218ef1f683d5584784b7d9607e7453b0729)
    (cherry picked from commit f1142a99a982cc6919143734680b3b0736b1c0b1)
    (cherry picked from commit 5ec74a4dbeda9b65fbe3cdae078fca713a1dad74)
    (cherry picked from commit 3f764c5d2b17568aebc5a551de08f9d489d62519)
    (cherry picked from commit 450c3b9f4739d80f859fa7b7ecc64fda35cb983d)
    (cherry picked from commit 7c2d5cdba50763c2f37bf173b9113f36dedef4ad)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia-lib 3.4.0

This issue was fixed in the openstack/octavia-lib 3.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia-lib yoga-eom

This issue was fixed in the openstack/octavia-lib yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia-lib 3.1.1

This issue was fixed in the openstack/octavia-lib 3.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia-lib wallaby-eom

This issue was fixed in the openstack/octavia-lib wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia-lib xena-eom

This issue was fixed in the openstack/octavia-lib xena-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.