[SRU] Loadbalancer is stuck with PENDING_UPDATE state on member update API

Bug #2067441 reported by Hoyoun Lee
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Antelope
New
Undecided
Unassigned
Caracal
New
Undecided
Unassigned
Yoga
New
Undecided
Unassigned
octavia
In Progress
Medium
Gregory Thiemonge
octavia (Ubuntu)
Status tracked in Oracular
Jammy
Confirmed
Undecided
Unassigned
Mantic
Won't Fix
Undecided
Unassigned
Noble
Confirmed
Undecided
Unassigned
Oracular
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

Loadbalancer is stuck with PENDING_UPDATE state on batch member update API.

[Test Case]

Please refer to [Test steps] section below.

[Regression Potential]

The fix is already in the upstream main, stable/2024.1, stable/2023.2, stable/2023.1 branches, so it is a clean backport and might be helpful for deployments using octavia.

I also test this fix, it works well - https://paste.ubuntu.com/p/wPy7pB3SR6/ and https://paste.ubuntu.com/p/zpPDScQCtK/

and I also test debdiff for this fix, it works well - https://paste.ubuntu.com/p/nS6c3QYRGn/

[Others]

Original Bug Description Below
===========

By mistake, I sent wrong request with duplicated ip, port compbination through the Batch Update Members API(ver 2023.1).
https://docs.openstack.org/api-ref/load-balancer/v2/#batch-update-members

For example :
192.0.2.16:80 Member already exists, and request data like follows
{
    "members": [
        {
            "subnet_id": "xxxxxxx",
            "address": "192.0.2.16",
            "protocol_port": 80
        }, {
            "subnet_id": "xxxxxxx",
            "address": "192.0.2.16",
            "protocol_port": 80
        }
    ]
}

After the request, the status of Loadbalancer does not change from PENDING_UPDATE.

When checking the source code, there is no logic to check for duplicates.

In the controller logic(member.py), members are classified into new_members/updated_members/deleted_member, but the updated_members data is being passed as is with duplicates, so this is suspected to be the cause of the problem.

## log : 33fe25ab-5477-4787-a8e1-f657376b0ead is duplicated
May 29 04:14:32 ubuntu octavia-worker[123317]: INFO octavia.controller.queue.v2.endpoints [-] Batch updating members: old='[]', new='[]', updated='['825dbebc-da79-4f88-bf48-0e3e63a09d90', '33fe25ab-5477-4787-a8e1-f657376b0ead', '33fe25ab-5477-4787-a8e1-f657376b0ead']'...
May 29 04:14:32 ubuntu octavia-worker[123317]: ERROR oslo_messaging.rpc.server [-] Exception during message handling: taskflow.exceptions.Duplicate: Atoms with duplicate names found: ['octavia-mark-member-active-indb-33fe25ab-5477-4787-a8e1-f657376b0ead']

FYI, There is validation logic for new_members.

[Test steps]

1, set up a openstack env with octavia deployment

2, create a test lb

3, add a member into lb pool

openstack loadbalancer member create --subnet-id private_subnet --address 192.168.21.226 --protocol-port 80 lb1-pool
$ openstack loadbalancer member list lb1-pool |grep ACTIVE
| b36bb21e-8eed-40bc-a1cb-e69da070c0b9 | | 4f1016d73ae245fe8c5c6a637930f3d2 | ACTIVE | 192.168.21.226 | 80 | ONLINE | 1 |

3, run test.py (https://paste.ubuntu.com/p/38vPW5R5S8/) to call batch member update API to add the same member (eg: 192.168.21.226 above)

4, then we will reproduce the problem, lb will be stuck with PENDING_UPDATE state.

$ openstack loadbalancer member list lb1-pool |grep 192
| b36bb21e-8eed-40bc-a1cb-e69da070c0b9 | | 4f1016d73ae245fe8c5c6a637930f3d2 | PENDING_UPDATE | 192.168.21.226 | 80 | ONLINE | 40 |

5, This is error log I saw - https://paste.ubuntu.com/p/K5s7knNmWw/

[Some Analyses]

You can see some analysis from the bugs I created earlier - https://bugs.launchpad.net/octavia/+bug/2070348

Tags: patch sts

Related branches

Hoyoun Lee (hoyoun-lee)
description: updated
summary: - Loadbalancers is stuck with PENDING_UPDATE state on member update API
+ Loadbalancer is stuck with PENDING_UPDATE state on member update API
Hoyoun Lee (hoyoun-lee)
description: updated
Revision history for this message
Gregory Thiemonge (gthiemonge) wrote : Re: Loadbalancer is stuck with PENDING_UPDATE state on member update API

Hi, I have an open patch that fixes this issue:

https://review.opendev.org/c/openstack/octavia/+/864192

I will ask folks to review it so we can backport it down to 2023.1

Changed in octavia:
assignee: nobody → Gregory Thiemonge (gthiemonge)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Hoyoun Lee (hoyoun-lee) wrote (last edit ):

https://review.opendev.org/c/openstack/octavia/+/921430 (2023.2)
https://review.opendev.org/c/openstack/octavia/+/921429 (2024.1)
https://review.opendev.org/c/openstack/octavia/+/921433 (2023.1)

Thank you for your bug-fix.

But, recently when I checked updated source code, I couldn't find updated code on branch(2023.1, 2023.2, 2024.1)
There is no "updated_member_uniques = set()".

Only on master branch, I found the updated code.

Is it gone? or is it going to apply?

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Hi, the backports to the stable branches are under review, I believe that they will merge this week (I'll ping reviewers about them)

Revision history for this message
Hoyoun Lee (hoyoun-lee) wrote :

I checked the merged commit on 2023.1, 2023.1, 2024.1.
I appreciate your concern.

Revision history for this message
Hua Zhang (zhhuabj) wrote :
description: updated
summary: - Loadbalancer is stuck with PENDING_UPDATE state on member update API
+ [SRU] Loadbalancer is stuck with PENDING_UPDATE state on member update
+ API
tags: added: sts
Revision history for this message
Hua Zhang (zhhuabj) wrote :
Revision history for this message
Hua Zhang (zhhuabj) wrote :
Revision history for this message
Hua Zhang (zhhuabj) wrote :
Revision history for this message
Hua Zhang (zhhuabj) wrote :

I uploaded 4 debdiffs, noble.debdiff, mantic.debdiff, jammy.debdiff and antelope.debdiff

1, I didn't upload debdiff for oracular, because it has the same pkg as noble

2, I didn't upload debdiff or bobcat and caracle, because they have the same pkg as noble and mantic

3, I didn't upload debdiff for focal-yoga, because it has the same pkg as jammy

I created these debdiffs by using pbuilder, I didn't test them with PPA due to one 'unmatch md5' issue - https://paste.ubuntu.com/p/6TSmYBdrhD/

but I create jammy.debdiff by using debuild locally, it works well, see - https://paste.ubuntu.com/p/nS6c3QYRGn/

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "noble.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 23.10 (Mantic Minotaur) has reached end of life, so this bug will not be fixed for that specific release.

Changed in octavia (Ubuntu Mantic):
status: New → Won't Fix
Revision history for this message
James Page (james-page) wrote :

I had a review of the debdiff's attached to this report.

*) Bug References

None of the changelog updates actually reference this bug report - please ensure that the changelog entry details what is being fixed and reference this bug using (LP: #2067441).

*) Patch naming

I'd prefer that we stick with one naming approach for the patches - other SRU uploaders use the lpBUGNUMBER.patch. Either way please don't leave the 0001 prefix from git format-patch in the filename.

1) oracular

This must be fixed in oracular first; the SRU team will quite likely reject an SRU that is not already fixed in development.

2) noble

debdiff uses the version number already consumed in oracular development; you need to follow the version numbering for Stable Release Updates to avoid version conflicts and ensure sequential progression of package versions for upgrades - see [0].

3) mantic is EOL - this update need to target jammy-bobcat directly.

4) antelope(and bobcat)

This debdiffs build directly on the automatically generated backport changelog entry from when these cloud archive series still had an ubuntu parent.

TL;DR the git repositories on Launchpad should be the canonical source for any packaging updates - I can see UNRELEASED changes in both stable/2023.{1,2} which instantly conflict with applying the debdiffs - please can you target merge proposals at the octavia repository [1] on Launchpad instead for all proposed updates:

[0] https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging
[1] https://code.launchpad.net/~ubuntu-openstack-dev/ubuntu/+source/octavia

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in octavia (Ubuntu Jammy):
status: New → Confirmed
Changed in octavia (Ubuntu Noble):
status: New → Confirmed
Changed in octavia (Ubuntu):
status: New → Confirmed
Revision history for this message
Hua Zhang (zhhuabj) wrote :

Hi @james-page , I have addressed your above comments, this is PR for oracular - https://code.launchpad.net/~zhhuabj/ubuntu/+source/octavia/+git/octavia/+merge/471140

Revision history for this message
Hua Zhang (zhhuabj) wrote :

The PR for oracular/development [1] has been overlaped with snapshots for dalmatian, so I continued to the following PRs for other stable branches.

1, the PR [2] for UA noble with caracal(2024.1, 14.0.0)
2, the PR [3] for UA jammy with yoga(10.1.1)
3, the RP [4] for UCA jammy-antelope with antelope(12.0.0)
4, no RP for UA mantic with bobcat(13.0.0) because it has been EOL
5, no PR for UCA jammy-bobcat as well because it has been EOL
6, no PR for UCA jammy-caracal due to UA noble
7, no PR for UCA focal-yoga due to UA jammy

[1] https://code.launchpad.net/~zhhuabj/ubuntu/+source/octavia/+git/octavia/+merge/471140
[2] https://code.launchpad.net/~zhhuabj/ubuntu/+source/octavia/+git/octavia/+merge/471563
[3] https://code.launchpad.net/~zhhuabj/ubuntu/+source/octavia/+git/octavia/+merge/471564
[4] https://code.launchpad.net/~zhhuabj/ubuntu/+source/octavia/+git/octavia/+merge/471566

Revision history for this message
James Page (james-page) wrote :

Fix included in recent snapshot for oracular

Changed in octavia (Ubuntu Oracular):
status: Confirmed → Fix Released
Revision history for this message
James Page (james-page) wrote :

Sponsored uploads for Ubuntu targets and the Antelope UCA pocket.

no longer affects: cloud-archive/bobcat
no longer affects: cloud-archive/zed
Changed in cloud-archive:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.