Race condition in the member batch update API call

Bug #2036156 reported by Gregory Thiemonge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Fix Released
High
Gregory Thiemonge

Bug Description

Using continuously the member batch update API on the same pool can lead to a race condition and leave the load balancer in PENDING_UPDATE.

For instance when creating a script that adds/removes/updates continuously the members of a pool (without waiting for the loadbalancer to be ACTIVE), the octavia-worker may attempt to update/add a deleted member.

Sep 15 02:50:46 gthiemon-devstack octavia-worker[83307]: INFO octavia.controller.queue.v2.endpoints [-] Batch updating members: old='['91e8dd20-c9da-495a-b4ab-d2ee705c235b']', new='[None]', updated='['d6705c52-97a2-484b-afbd-7025d72e8b6e', 'bf120350-0d6d-4f93-b91a-f3b8f25114a2', 'abff3d7b-4fd5-4b9f-bca4-5a08dfd22ece', '7ec01f29-de48-413b-84cc-9e1a382ed7aa']'...
Sep 15 02:50:46 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:47 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:49 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:52 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:56 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:01 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:06 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:11 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:16 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:21 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:26 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:31 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:36 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:41 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server [-] Exception during message handling: tenacity.RetryError: RetryError[<Future at 0x7fb634b0cd60 state=finished raised NoResultFound>]
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 426, in __call__
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server result = fn(*args, **kwargs)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/opt/stack/octavia/octavia/controller/worker/v2/controller_worker.py", line 544, in batch_update_members
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server raise db_exceptions.NoResultFound
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server sqlalchemy.exc.NoResultFound: ()
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server The above exception was the direct cause of the following exception:
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/opt/stack/octavia/octavia/controller/queue/v2/endpoints.py", line 132, in batch_update_members
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server self.worker.batch_update_members(
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 333, in wrapped_f
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server return self(f, *args, **kw)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 423, in __call__
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server do = self.iter(retry_state=retry_state)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 373, in iter
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server six.raise_from(retry_exc, fut.exception())
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "<string>", line 3, in raise_from
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server tenacity.RetryError: RetryError[<Future at 0x7fb634b0cd60 state=finished raised NoResultFound>]
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server

In the batch update code, we compute the list of the members (to update/delete/add), we use the body of the API request and the current members of the DB.
But the current members are fetched from the DB before we lock the loadbalancer:

Fetching old members:
https://opendev.org/openstack/octavia/src/commit/d7bba6c5af3c0f6d99ea8063b54ec3e895a5eea4/octavia/api/v2/controllers/member.py#L339

Locking the LB:
https://opendev.org/openstack/octavia/src/commit/d7bba6c5af3c0f6d99ea8063b54ec3e895a5eea4/octavia/api/v2/controllers/member.py#L355

So between line 339 and 355, a worker may have updated the DB (deleting a member), creating a incorrect behavior (a deleted member ID is passed to the worker)

Note: to reproduce it in devstack, I had to add a time.sleep between those 2 lines of code, it simulates a very busy server.

Changed in octavia:
assignee: nobody → Gregory Thiemonge (gthiemonge)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/octavia/+/895251

Changed in octavia:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (master)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/895251
Committed: https://opendev.org/openstack/octavia/commit/8a47577b7668e6dc82e048d937d1344c667e5124
Submitter: "Zuul (22348)"
Branch: master

commit 8a47577b7668e6dc82e048d937d1344c667e5124
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4

Changed in octavia:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/octavia/+/896384

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/octavia/+/896385

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/octavia/+/896386

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/octavia/+/896387

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/octavia/+/896388

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/octavia/+/896389

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/896385
Committed: https://opendev.org/openstack/octavia/commit/64d9bdb97556652aa05a34276778260b541ad9c6
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 64d9bdb97556652aa05a34276778260b541ad9c6
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4
    (cherry picked from commit 8a47577b7668e6dc82e048d937d1344c667e5124)
    (cherry picked from commit 8a5bed4cebd70c1c721a3916dc79f2b19343e445)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/896389
Committed: https://opendev.org/openstack/octavia/commit/77e14a8d790bfb0facaf2ee56c1f798e58269b13
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 77e14a8d790bfb0facaf2ee56c1f798e58269b13
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4
    (cherry picked from commit 8a47577b7668e6dc82e048d937d1344c667e5124)
    (cherry picked from commit 8a5bed4cebd70c1c721a3916dc79f2b19343e445)
    (cherry picked from commit 64d9bdb97556652aa05a34276778260b541ad9c6)
    (cherry picked from commit 4c97b585ce6e519a5ff090c2cd97fab0e801be0c)
    (cherry picked from commit 231398ca5fee7b825646eeed84523dbfcb414438)
    (cherry picked from commit c6dd946b996fb6e5720892db25c2ad0a217b8bb2)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/896387
Committed: https://opendev.org/openstack/octavia/commit/231398ca5fee7b825646eeed84523dbfcb414438
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 231398ca5fee7b825646eeed84523dbfcb414438
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4
    (cherry picked from commit 8a47577b7668e6dc82e048d937d1344c667e5124)
    (cherry picked from commit 8a5bed4cebd70c1c721a3916dc79f2b19343e445)
    (cherry picked from commit 64d9bdb97556652aa05a34276778260b541ad9c6)
    (cherry picked from commit 4c97b585ce6e519a5ff090c2cd97fab0e801be0c)

tags: added: in-stable-yoga
tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/896388
Committed: https://opendev.org/openstack/octavia/commit/c6dd946b996fb6e5720892db25c2ad0a217b8bb2
Submitter: "Zuul (22348)"
Branch: stable/xena

commit c6dd946b996fb6e5720892db25c2ad0a217b8bb2
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4
    (cherry picked from commit 8a47577b7668e6dc82e048d937d1344c667e5124)
    (cherry picked from commit 8a5bed4cebd70c1c721a3916dc79f2b19343e445)
    (cherry picked from commit 64d9bdb97556652aa05a34276778260b541ad9c6)
    (cherry picked from commit 4c97b585ce6e519a5ff090c2cd97fab0e801be0c)
    (cherry picked from commit 231398ca5fee7b825646eeed84523dbfcb414438)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/896386
Committed: https://opendev.org/openstack/octavia/commit/4c97b585ce6e519a5ff090c2cd97fab0e801be0c
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 4c97b585ce6e519a5ff090c2cd97fab0e801be0c
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4
    (cherry picked from commit 8a47577b7668e6dc82e048d937d1344c667e5124)
    (cherry picked from commit 8a5bed4cebd70c1c721a3916dc79f2b19343e445)
    (cherry picked from commit 64d9bdb97556652aa05a34276778260b541ad9c6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 10.1.1

This issue was fixed in the openstack/octavia 10.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 11.0.2

This issue was fixed in the openstack/octavia 11.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia wallaby-eom

This issue was fixed in the openstack/octavia wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia xena-eom

This issue was fixed in the openstack/octavia xena-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 14.0.0.0rc1

This issue was fixed in the openstack/octavia 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/octavia/+/896384
Committed: https://opendev.org/openstack/octavia/commit/8a5bed4cebd70c1c721a3916dc79f2b19343e445
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 8a5bed4cebd70c1c721a3916dc79f2b19343e445
Author: Gregory Thiemonge <email address hidden>
Date: Fri Sep 15 03:42:26 2023 -0400

    Fix race condition in members batch update API call

    Fix a potential race condition, the member batch update API call
    computes a delta between the current state of the members in the DB and
    the members in the requests. But it fetches the members from the DB
    before locking the load balancer, it means that another Octavia service
    may still update the list of members, then the computation of the
    changes may be incorrect.

    The list of members is now queried in the locked section.

    Closes-Bug: #2036156
    Change-Id: I453ff385620f488a3e43d4fda01634cdce3be5d4
    (cherry picked from commit 8a47577b7668e6dc82e048d937d1344c667e5124)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.