Using continuously the member batch update API on the same pool can lead to a race condition and leave the load balancer in PENDING_UPDATE.
For instance when creating a script that adds/removes/updates continuously the members of a pool (without waiting for the loadbalancer to be ACTIVE), the octavia-worker may attempt to update/add a deleted member.
Sep 15 02:50:46 gthiemon-devstack octavia-worker[83307]: INFO octavia.controller.queue.v2.endpoints [-] Batch updating members: old='['91e8dd20-c9da-495a-b4ab-d2ee705c235b']', new='[None]', updated='['d6705c52-97a2-484b-afbd-7025d72e8b6e', 'bf120350-0d6d-4f93-b91a-f3b8f25114a2', 'abff3d7b-4fd5-4b9f-bca4-5a08dfd22ece', '7ec01f29-de48-413b-84cc-9e1a382ed7aa']'...
Sep 15 02:50:46 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:47 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:49 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:52 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:50:56 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:01 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:06 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:11 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:16 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:21 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:26 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:31 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:36 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:41 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: WARNING octavia.controller.worker.v2.controller_worker [-] Failed to fetch one of the new members from DB. Retrying for up to 60 seconds.
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server [-] Exception during message handling: tenacity.RetryError: RetryError[<Future at 0x7fb634b0cd60 state=finished raised NoResultFound>]
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 426, in __call__
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server result = fn(*args, **kwargs)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/opt/stack/octavia/octavia/controller/worker/v2/controller_worker.py", line 544, in batch_update_members
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server raise db_exceptions.NoResultFound
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server sqlalchemy.exc.NoResultFound: ()
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server The above exception was the direct cause of the following exception:
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/opt/stack/octavia/octavia/controller/queue/v2/endpoints.py", line 132, in batch_update_members
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server self.worker.batch_update_members(
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 333, in wrapped_f
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server return self(f, *args, **kw)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 423, in __call__
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server do = self.iter(retry_state=retry_state)
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 373, in iter
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server six.raise_from(retry_exc, fut.exception())
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server File "<string>", line 3, in raise_from
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server tenacity.RetryError: RetryError[<Future at 0x7fb634b0cd60 state=finished raised NoResultFound>]
Sep 15 02:51:46 gthiemon-devstack octavia-worker[83307]: ERROR oslo_messaging.rpc.server
In the batch update code, we compute the list of the members (to update/delete/add), we use the body of the API request and the current members of the DB.
But the current members are fetched from the DB before we lock the loadbalancer:
Fetching old members:
https://opendev.org/openstack/octavia/src/commit/d7bba6c5af3c0f6d99ea8063b54ec3e895a5eea4/octavia/api/v2/controllers/member.py#L339
Locking the LB:
https://opendev.org/openstack/octavia/src/commit/d7bba6c5af3c0f6d99ea8063b54ec3e895a5eea4/octavia/api/v2/controllers/member.py#L355
So between line 339 and 355, a worker may have updated the DB (deleting a member), creating a incorrect behavior (a deleted member ID is passed to the worker)
Note: to reproduce it in devstack, I had to add a time.sleep between those 2 lines of code, it simulates a very busy server.
Fix proposed to branch: master /review. opendev. org/c/openstack /octavia/ +/895251
Review: https:/