upgrade 22.04 -> 24.04 won't start due to feature flags

Bug #2074309 reported by Hadmut Danisch
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
rabbitmq-server (Ubuntu)
Status tracked in Oracular
Noble
Confirmed
High
Unassigned
Oracular
Confirmed
High
Unassigned
ubuntu-release-upgrader (Ubuntu)
Status tracked in Oracular
Noble
Fix Committed
Undecided
Unassigned
Oracular
New
Undecided
Unassigned

Bug Description

[Impact]

Upgrading systems with rabbitmq-server installed will leave the server in a bad state because upgrading directly from the version in Jammy to the version in Noble is not supported by upstream rabbitmq-server.

[Test Plan]

For now, we add an upgrade quirk to prevent upgrades from Jammy to Noble if rabbitmq-server is installed, and provide an brief explanation to the user.

To test:

1. Create a Jammy container

$ lxc launch ubuntu:jammy jammy

2. Install rabbitmq-server

$ apt install -y rabbitmq-server

3. Attempt the upgrade

$ do-release-upgrade -d

The upgrade should abort with a message explaining why.

[Impact]

This quirk uses a common pattern in ubuntu-release-upgrader for similar purposes. If the package name was typo'd, it would not work correctly.

[Other information]

We may eventually need another SRU to revert this change if another solution is found.

[Original Description]

Hi,

I was just doing an upgrade from ubuntu 22.04 server to 24.04 server.

After upgrading, the rabbitmq-server cannot be started anymore.

Reason:

See
https://www.rabbitmq.com/blog/2022/07/20/required-feature-flags-in-rabbitmq-3.11

Higher versions of rabbitmq require feature flags to be set *before* upgrade. In my rabbitmq installation these features had not been set, therefore the the rabbitmq server would not start. They ask you to downgerade to an earlier version and do the upgrade.

I therefore used LXD to run a Ubuntu 22.04 machine with rabbitmq 3.9 to set the missing feature flags and achieved:

rabbitmqctl enable_feature_flag all

rabbitmqctl list_feature_flags
Listing feature flags ...
name state
implicit_default_bindings enabled
maintenance_mode_status enabled
quorum_queue enabled
stream_queue enabled
user_limits enabled
virtual_host_metadata enabled

But it still does not work, it still complains about one feature missing:

classic_mirrored_queue_version

unfortunately, this is required by rabbitmq 3.12 coming with Ubuntu 24.04, but is not known and thus cannot be set by rabbitmq 3.9 coming with Ubuntu 22.04.

Now the problem is: There is no Ubuntu coming with 3.9 or 3.10. Ubuntu jumps directly from rabbitmq 3.9 to rabbitmq 3.12, although there is no upgrade path from 3.9 to 3.12.

Three nasty options:

- loose your data and configuration and run from scratch
- run an old version in either docker or LXD
- try some upgrade path with docker/podman and non-ubuntu versions of rabbitmq

Unfortunately,

https://hub.docker.com/_/rabbitmq

does not list versions older than 3.12, but they still seem to be available, so this might by some migration/workaround path.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

OK, found a workaround. From the 24.04 system, make sure to

systemctl stop rabbitmq-server.service
systemctl stop epmd.service

make a safety copy/backup of /var/lib/rabbitmq

Repeat the following two podman commands for 3.9 , 3.10, and 3.11:

In one shell run (replace 127 and 138 with uid and gid of rabbitmq on your system and make sure that $HOST is set to your hostname, must be the same as your host for things to work, the -p ports are not needed, but a way to make sure no other daemon is running)

podman run -it --rm -v /var/lib/rabbitmq:/var/lib/rabbitmq --uidmap=0:0 --uidmap=u999:127 --gidmap=0:0 --gidmap=g999:138 -p 5672:5672 -p 15672:15672 --name $HOST -h $HOST docker.io/library/rabbitmq:3.9

wait for it to come up and run from a second shell

podman exec -it -u rabbitmq $HOST rabbitmqctl enable_feature_flag all

(or run /bin/bash and check with
rabbitmqctl list_feature_flags
rabbitmqctl enable_feature_flag all
rabbitmqctl list_feature_flags
)

after doing this three times with 3.9 3.10 and 3.11

restart services epmd and then rabbitmq-server , and things should work now. Once happy, you can remove the podman images.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

Forgot: Once finished with the second podman command, go back to the first shell, check the log about feature setting, and terminate with ctrl-c.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thank you for reporting this bug and even following up on it Hadmut!

This sounds like something that could potentially be fixed in the Noble maintainer scripts.

I'd want to find the exact flags we need and enable just those, since you can't disable feature flags once they are enabled.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

Problem 1: I have no idea how long docker-hub and the rabbitmq people will offer these outdated docker images for download. Maybe it is advisable to pull and save a copy.

Problem 2: This could (as it did with me, but I chose to update with the -d flag, so my own fault) drive people into severe trouble, if they do upgrade their production server or desktop from 22.04 to 24.04.1 and thus break rabbitmq-server and make their stored data unavailable. So it should be a stopper for do-release-upgrade and at least warn people before upgrading as long as this isn't fixed. 24.04.1 is announced to be released in two weeks, and .1 usually enables to upgrade LTS-to-LTS.

Problem 3: It is possible that this workaround does not smoothly run within LXD containers in current 24.04. Today I ran into the problem that after upgrading to 24.04 docker cannot be run within LXD containers anymore, because of some strange kernel/apparmor/runc problem, although there is a simple, but not obvious fix for that. Some people say that podman cannot be run either because of the same problem.

I was able fix all these problems, but took me some time, and it might overstrain average users.

regards
Hadmut

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Are you running rabbitmq-server on the host, or in some container?

I just ran a quick test in an LXD container doing do-release-upgrade and it seems to work fine to me after upgrading to Noble.

Were there any special configuration options you had enabled in Jammy before the upgrade?

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

To more explicitly say what I tested:

I ran the following commands to create a Jammy container and run the upgrade process:
lxc launch ubuntu:jammy j-n-upgrade
lxc shell j-n-upgrade
apt update -y
apt install -y rabbitmq-server
apt install -y ubuntu-release-upgrader-core
sed -i 's/Prompt=lts/Prompt=normal/' /etc/update-manager/release-upgrades
do-release-upgrade -f DistUpgradeViewNonInteractive

Then after the upgrade was done, I created a simple rabbitmq-server script which I attached.
# apt install -y python3-pika
# ./test.sh

Revision history for this message
Hadmut Danisch (hadmut) wrote :

I was running it directly on the host, but noticed the docker problem when checking, why other services (not rabbitmq) I'm running within an LXD container didn't start.

Your tests might not show the problem, because you've made a fresh install, and rabbitmq-server seems to automatically set flags when the database is newly created. What features had been set in your test environment?

My rabbitmq-installation was running for years and already went through several release upgrades, therefore, none of the feature flags had been set in my database. (Until yesterday, I did not even know that these feature flags exist.)

So testing with a freshly created database might not show the problem.

The best way to deal with the problem would be to put something in the do-release-upgrade script which checks if rabbitmq-server is installed. If not – do nothing, proceed.

But if so, issue a warning text and give the choice to abort or proceed.

In the text, explain, that there is no direct stable upgrade path from 3.9 to 3.12, and that it could be possible under certain circumstances, that the server won't start after upgrade. People should urgently be requested to make a backup of /var/lib/rabbitmq, and make sure to set the recommended flags (or do set it in the upgrade procedure as a third option), and pointed to instructions about what to do if the service does not start after upgrading, or proposed to accept a complete loss and recreate the database from scratch (what could be acceptable in many use cases).

BTW, I just saw that there already are closely related bugs about the same problem: Bug #2038818 and Bug #2046665

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Jammy:
root@j:~# rabbitmqctl list_feature_flags
Listing feature flags ...
name state
implicit_default_bindings enabled
maintenance_mode_status enabled
quorum_queue enabled
stream_queue enabled
user_limits enabled
virtual_host_metadata enabled

Noble (after upgrading from Jammy):
root@j-n-upgrade:~# rabbitmqctl list_feature_flags
Listing feature flags ...
name state
classic_mirrored_queue_version enabled
classic_queue_type_delivery_support enabled
direct_exchange_routing_v2 enabled
feature_flags_v2 enabled
implicit_default_bindings enabled
listener_records_in_ets enabled
maintenance_mode_status enabled
quorum_queue enabled
restart_streams enabled
stream_queue enabled
stream_sac_coordinator_unblock_group enabled
stream_single_active_consumer enabled
tracking_records_in_ets enabled
user_limits enabled
virtual_host_metadata enabled

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks for posting the other bugs, I tracked 2046665 to rabbitmq-server.

tags: added: server-triage-discuss
Changed in rabbitmq-server (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in rabbitmq-server (Ubuntu):
status: New → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
tags: added: server-todo
removed: server-triage-discuss
Changed in rabbitmq-server (Ubuntu Noble):
milestone: none → ubuntu-24.04.1
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Andreas,
I furthermore checked with "our" openstack team if that isn't something they've already hit, but they have no jammy -> noble upgrade path for their use of rabbitmq yet. Although they offered help to verify any fix we might propose by test upgrading their rabbitmq node using it.

This flag handling is indeed is a common practice, as for example in 3.11 needed all flags of 3.8.x just the same way [1].

The bonus problem that might make this harder, is that [2] states that 3.12 needs all flags of the 3.11.x series. But Jammy to Noble is 3.9.x -> 3.12 - so we might need an interim 3.11?!? That makes this quite complex unless e.g. a backport of the lastest 3.9.x makes that possible too the outcome will be quite complex.
Yet on the other hand, while some text says "before upgrade" and I do not see that in the OpenStack solution [4] Andreas found - to be fair that is so abstracted that I'm even yet unsure what/when it exactly does.

I must admit that I think we need a bit more time to get a better understanding of this [3] in general.
And even then we might still want to avoid too many assumptions and just ask upstream if there is any 3.9->3.12 way without an interim 3.11 - and discuss a more detailed plan of action from there.

Either way, big thanks to Hadmut to bring this to our attention!

[1]: https://www.rabbitmq.com/blog/2022/07/20/required-feature-flags-in-rabbitmq-3.11
[2]: https://github.com/rabbitmq/rabbitmq-server/discussions/8456
[3]: https://www.rabbitmq.com/docs/feature-flags
[4]: https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/919701

Bryce Harrington (bryce)
summary: - rabbitmq-server upgrade 22.04 -> 24.04 completely broken
+ rabbitmq-server upgrade 22.04 -> 24.04 won't start due to feature flags
summary: - rabbitmq-server upgrade 22.04 -> 24.04 won't start due to feature flags
+ upgrade 22.04 -> 24.04 won't start due to feature flags
Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

There is a github issue mentioning this[0]. There is potential plans to improve feature flags[1], but they do not help us right now, and will not help users that encounter this situation.

It seems the properly supported upgrade path is indeed installing 3.10 -> 3.11.

[0] - https://github.com/rabbitmq/rabbitmq-server/discussions/11418
[1] - https://github.com/rabbitmq/rabbitmq-server/issues/9677

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

I made an upstream bug report directly asking for support doing these major version hops[0].

[0] - https://github.com/rabbitmq/rabbitmq-server/discussions/11938

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I added an ubuntu-release-upgrader task because we might consider blocking the release upgrade there if the conditions for this bug are found to be present.

Those conditions might be as simple as "is rabbitmq-server installed (and running)?". Or perhaps a more generic one, carefully checking the version number jump between jammy and noble, because in the future we might have a newer version in jammy exactly because of this bug.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

I just want to update on my upgrade test.

First and foremost - a direct upgrade from Jammy to Noble is not possible.

It only is if you have it installed and have not used rabbitmq-server, thus no metadata is made yet.

If you use the server in Jammy and then attempt to upgrade to Noble right now you will see:

Errors were encountered while processing:
 rabbitmq-server
needrestart is being skipped since dpkg has failed
packages have been installed but needrestart is suspended
packages have been installed but needrestart is suspended
Exception during pm.DoInstall(): E:Sub-process /usr/bin/dpkg returned an error code (1)
Setting up rabbitmq-server (3.12.1-1ubuntu1) ...
Job for rabbitmq-server.service failed because the control process exited with error code.
See "systemctl status rabbitmq-server.service" and "journalctl -xeu rabbitmq-server.service" for details.
invoke-rc.d: initscript rabbitmq-server, action "restart" failed.
● rabbitmq-server.service - RabbitMQ Messaging Server
     Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2024-08-09 18:08:11 UTC; 7ms ago
    Process: 34558 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=1/FAILURE)
   Main PID: 34558 (code=exited, status=1/FAILURE)
     Status: "Standing by"
        CPU: 1.631s
invoke-rc.d: release upgrade in progress, error is not fatal
root@j-n-upgrade2:~# lsb_Release -a
Command 'lsb_Release' not found, did you mean:
  command 'lsb_release' from deb lsb-release (12.0-2)
Try: apt install <deb name>
root@j-n-upgrade2:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04 LTS
Release: 24.04
Codename: noble
root@j-n-upgrade2:~# systemctl status rabbitmq-server
● rabbitmq-server.service - RabbitMQ Messaging Server
     Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2024-08-09 18:12:26 UTC; 7s ago
    Process: 36160 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=1/FAILURE)
   Main PID: 36160 (code=exited, status=1/FAILURE)
     Status: "Standing by"
        CPU: 1.566s

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Hadmut, thanks again for bringing this to our attention.

With 24.04.1 coming out soon we may expect more upgrades from Jammy -> Noble. Therefore I am looking to block the upgrade (22.04 -> 24.04) if rabbitmq-server is installed.

This is unfortunate, but I think is a fair to not make rabbitmq-server unusable for users after upgrade. This will also give us time to evaluate a proper upgrade path that can be relied upon, and allow users to backup data before attempting anything.

Likely it will be a manual process of upgrading interim releases (rabbitmq-server 3.9.X -> 3.10.X -> 3.11.X), or accepting that you will need to recreate your server's database, but that's something I will be working on.

Nick Rosbrook (enr0n)
Changed in ubuntu-release-upgrader (Ubuntu Noble):
status: New → In Progress
Nick Rosbrook (enr0n)
description: updated
Revision history for this message
Hadmut Danisch (hadmut) wrote :

You're welcome. :-)

As I commented in #1, the upgrade using podman was working well for me, but the question is, how long these docker images of old versions will be available.

When creating an upgrade strategy, you also should take into consideration, how long this upgrade should work, i.e. whether the upgrade path should still work in 2027 when 22.04's support will end.

Another problem is that 24.04 is still stuck to rabbitmq 3.12, which is out of support and does not receive updates anymore, thus could be a security problem and require an emergency update 3.12 -> 3.13 if a security problem is found.

Maybe it would be easier and less complex, to first update 24.04 from rabbitmq 3.12 to 3.13, which seems to be the last 3.x version (they are beta-testing 4.x), as a reasonable version for 24.04 LTS, and check whether there's a similar upgrade problem 3.12->3.13, and incorporate this into your solution.

And, keep in mind, you probably will have similar problems with future versions. The same problem will probably reoccur.

Maybe, one clean and simple solution could be to decouple rabbitmq-packages from Ubuntu releases and simply turn them into snaps with a channel for each version, i.e. do no upgrade at all and just replace ubuntu package 3.9 with snap channel 3.9 during update to keep things running and benefit from snap's revert mechanism in case something doesn't start.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Great points Hadmut.

It is unfortunate how far behind we are in versions. We follow debian which hasn't been updated in a bit. I was thinking about updating the package there, but I need to fix the fires here first (and if I update debian to the latest without handling this bug's issue, I will break people's server).

I am definitely interested in updating rabbitmq-server to 3.13, however we do not often update major versions because they are prone to breaking things (hence, this bug report). If anything, I would consider making a new binary `rabbitmq-server-3.13` similar to how say, gcc-13 does it. That way users can upgrade to the new version but it's more of an informed decision rather than things just breaking upon an `apt update && apt upgrade`.

Even if we stay on top of it, users might not `apt update && apt upgrade` for an extended period of time, and then we would have to deal with that.

You are on point, I was also thinking about the potential of snaps long term.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

There's another option, although an ugly one.

If it is too bulky to be supported in a clean way, and too incompatible with Ubuntu's upgrade cycles, and if the problem with upgrading can't be properly solved in the Ubuntu universe, it could simply be dropped from Ubuntu (and Debian), and users be redirected to either use rabbitmq's own apt repository, or docker/podman/kubernetes containers.

After all, hasn't Ubuntu just invented a more secure way to use third party apt repositories for 24.04?

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Hadmut, or anyone else affected,

Accepted ubuntu-release-upgrader into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ubuntu-release-upgrader/1:24.04.21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ubuntu-release-upgrader (Ubuntu Noble):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-noble
Revision history for this message
Nick Rosbrook (enr0n) wrote :

I verified the upgrade quirk using ubuntu-release-upgrader from noble-proposed:

nr@six:~$ lxc launch ubuntu:jammy jammy
nr@six:~$ lxc exec jammy bash
root@jammy:~# apt install -y rabbitmq-server
[ ...SNIP... ]
root@jammy:~# do-release-upgrade --proposed
Checking for a new Ubuntu release
There is no development version of an LTS available.
To upgrade to the latest non-LTS development release
set Prompt=normal in /etc/update-manager/release-upgrades.
root@jammy:~# sed -i 's/Prompt=lts/Prompt=normal/g' /etc/update-manager/release-upgrades
root@jammy:~# do-release-upgrade --proposed
[ ...SNIP... ]

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done

Unable to upgrade to Ubuntu 24.04 LTS

Currently, you have RabbitMQ server installed, which is not directly
upgradable to the newer version. Upgrading may prevent the server
from starting due to missing feature flags.

For more information, please see
https://bugs.launchpad.net/bugs/2074309.

Restoring original system state

Aborting
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done

tags: added: verification-done verification-done-noble
removed: verification-needed verification-needed-noble
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Hadmut, or anyone else affected,

Accepted ubuntu-release-upgrader into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ubuntu-release-upgrader/1:24.04.22 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-noble
removed: verification-done verification-done-noble
Revision history for this message
Nick Rosbrook (enr0n) wrote :

The most recent upload contains a new quirk for an unrelated bug, and fixes for another unrelated bug. The previous verification done with 1:24.04.21 stands.

tags: added: verification-done verification-done-noble
removed: verification-needed verification-needed-noble
Revision history for this message
Hadmut Danisch (hadmut) wrote :

Hi,

actually I do not have any problem to fix anymore since the workaround procedure with podman described above totally solved my problem. (Improvement: replace 127 with $(id -u rabbitmq) and 138 with $(id -g rabbitmq)

But to test the bug fix I've setup a fresh 22.04 in LXD, installed rabbitmq-server, enabled -proposed, copied my backup of my old server into /var/lib/rabbitmq and ran a do-release-upgrade -p.

And yes, the upgrade was denied:

Unable to upgrade to Ubuntu 24.04 LTS

Currently, you have RabbitMQ server installed, which is not directly
upgradable to the newer version. Upgrading may prevent the server
from starting due to missing feature flags.

For more information, please see
https://bugs.launchpad.net/bugs/2074309.

Ursprünglicher Systemzustand wird wieder hergestellt

Wird abgebrochen

So yes, it kept me from breaking the server.

However, I'd suggest to extend the message a bit, since a not so experienced admin just wouldn't know what to do now.

I'd suggest to tell the three options:

- stay safe, but outdated at 22.04 (for unknown time)

- lose your database and all your data, and complete purge the rabbitmq-server package, do update, and do fresh install from scratch

- stop daemon, backup /var/lib/rabbitmq, remove, but not purge the package, do the upgrade, use the workaround procedure with podman as described above (and no warranty that it works and images are still available), and reinstall/start.

As far as I know, Ubuntu has a Wiki. Wouldn't it be better to make a wiki page with instructions instead of pointing to this bug?

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Hi Hadmut, yes I agree. Unfortunately it's a bit easier to modify this bug than the ubuntu release ugprade message, hence why the message in the upgrader is so sparse. The hopeful plan is to have a nice wiki page to redirect to.

I'm working on a good alternative upgrade path that we can rely on for the future.

Personally, I'd rather have all the information in this bug rather than copying information in multiple places, so we have one source of truth to follow.

P.S. I really appreciate all your effort here Hadmut.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.