rabbitmq-server upgrade 22.04 -> 24.04 completely broken

Bug #2074309 reported by Hadmut Danisch
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
rabbitmq-server (Ubuntu)
Status tracked in Oracular
Noble
Confirmed
High
Unassigned
Oracular
Confirmed
High
Unassigned

Bug Description

Hi,

I was just doing an upgrade from ubuntu 22.04 server to 24.04 server.

After upgrading, the rabbitmq-server cannot be started anymore.

Reason:

See
https://www.rabbitmq.com/blog/2022/07/20/required-feature-flags-in-rabbitmq-3.11

Higher versions of rabbitmq require feature flags to be set *before* upgrade. In my rabbitmq installation these features had not been set, therefore the the rabbitmq server would not start. They ask you to downgerade to an earlier version and do the upgrade.

I therefore used LXD to run a Ubuntu 22.04 machine with rabbitmq 3.9 to set the missing feature flags and achieved:

rabbitmqctl enable_feature_flag all

rabbitmqctl list_feature_flags
Listing feature flags ...
name state
implicit_default_bindings enabled
maintenance_mode_status enabled
quorum_queue enabled
stream_queue enabled
user_limits enabled
virtual_host_metadata enabled

But it still does not work, it still complains about one feature missing:

classic_mirrored_queue_version

unfortunately, this is required by rabbitmq 3.12 coming with Ubuntu 24.04, but is not known and thus cannot be set by rabbitmq 3.9 coming with Ubuntu 22.04.

Now the problem is: There is no Ubuntu coming with 3.9 or 3.10. Ubuntu jumps directly from rabbitmq 3.9 to rabbitmq 3.12, although there is no upgrade path from 3.9 to 3.12.

Three nasty options:

- loose your data and configuration and run from scratch
- run an old version in either docker or LXD
- try some upgrade path with docker/podman and non-ubuntu versions of rabbitmq

Unfortunately,

https://hub.docker.com/_/rabbitmq

does not list versions older than 3.12, but they still seem to be available, so this might by some migration/workaround path.

Tags: server-todo
Revision history for this message
Hadmut Danisch (hadmut) wrote :

OK, found a workaround. From the 24.04 system, make sure to

systemctl stop rabbitmq-server.service
systemctl stop epmd.service

make a safety copy/backup of /var/lib/rabbitmq

Repeat the following two podman commands for 3.9 , 3.10, and 3.11:

In one shell run (replace 127 and 138 with uid and gid of rabbitmq on your system and make sure that $HOST is set to your hostname, must be the same as your host for things to work, the -p ports are not needed, but a way to make sure no other daemon is running)

podman run -it --rm -v /var/lib/rabbitmq:/var/lib/rabbitmq --uidmap=0:0 --uidmap=u999:127 --gidmap=0:0 --gidmap=g999:138 -p 5672:5672 -p 15672:15672 --name $HOST -h $HOST docker.io/library/rabbitmq:3.9

wait for it to come up and run from a second shell

podman exec -it -u rabbitmq $HOST rabbitmqctl enable_feature_flag all

(or run /bin/bash and check with
rabbitmqctl list_feature_flags
rabbitmqctl enable_feature_flag all
rabbitmqctl list_feature_flags
)

after doing this three times with 3.9 3.10 and 3.11

restart services epmd and then rabbitmq-server , and things should work now. Once happy, you can remove the podman images.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

Forgot: Once finished with the second podman command, go back to the first shell, check the log about feature setting, and terminate with ctrl-c.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thank you for reporting this bug and even following up on it Hadmut!

This sounds like something that could potentially be fixed in the Noble maintainer scripts.

I'd want to find the exact flags we need and enable just those, since you can't disable feature flags once they are enabled.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

Problem 1: I have no idea how long docker-hub and the rabbitmq people will offer these outdated docker images for download. Maybe it is advisable to pull and save a copy.

Problem 2: This could (as it did with me, but I chose to update with the -d flag, so my own fault) drive people into severe trouble, if they do upgrade their production server or desktop from 22.04 to 24.04.1 and thus break rabbitmq-server and make their stored data unavailable. So it should be a stopper for do-release-upgrade and at least warn people before upgrading as long as this isn't fixed. 24.04.1 is announced to be released in two weeks, and .1 usually enables to upgrade LTS-to-LTS.

Problem 3: It is possible that this workaround does not smoothly run within LXD containers in current 24.04. Today I ran into the problem that after upgrading to 24.04 docker cannot be run within LXD containers anymore, because of some strange kernel/apparmor/runc problem, although there is a simple, but not obvious fix for that. Some people say that podman cannot be run either because of the same problem.

I was able fix all these problems, but took me some time, and it might overstrain average users.

regards
Hadmut

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Are you running rabbitmq-server on the host, or in some container?

I just ran a quick test in an LXD container doing do-release-upgrade and it seems to work fine to me after upgrading to Noble.

Were there any special configuration options you had enabled in Jammy before the upgrade?

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

To more explicitly say what I tested:

I ran the following commands to create a Jammy container and run the upgrade process:
lxc launch ubuntu:jammy j-n-upgrade
lxc shell j-n-upgrade
apt update -y
apt install -y rabbitmq-server
apt install -y ubuntu-release-upgrader-core
sed -i 's/Prompt=lts/Prompt=normal/' /etc/update-manager/release-upgrades
do-release-upgrade -f DistUpgradeViewNonInteractive

Then after the upgrade was done, I created a simple rabbitmq-server script which I attached.
# apt install -y python3-pika
# ./test.sh

Revision history for this message
Hadmut Danisch (hadmut) wrote :

I was running it directly on the host, but noticed the docker problem when checking, why other services (not rabbitmq) I'm running within an LXD container didn't start.

Your tests might not show the problem, because you've made a fresh install, and rabbitmq-server seems to automatically set flags when the database is newly created. What features had been set in your test environment?

My rabbitmq-installation was running for years and already went through several release upgrades, therefore, none of the feature flags had been set in my database. (Until yesterday, I did not even know that these feature flags exist.)

So testing with a freshly created database might not show the problem.

The best way to deal with the problem would be to put something in the do-release-upgrade script which checks if rabbitmq-server is installed. If not – do nothing, proceed.

But if so, issue a warning text and give the choice to abort or proceed.

In the text, explain, that there is no direct stable upgrade path from 3.9 to 3.12, and that it could be possible under certain circumstances, that the server won't start after upgrade. People should urgently be requested to make a backup of /var/lib/rabbitmq, and make sure to set the recommended flags (or do set it in the upgrade procedure as a third option), and pointed to instructions about what to do if the service does not start after upgrading, or proposed to accept a complete loss and recreate the database from scratch (what could be acceptable in many use cases).

BTW, I just saw that there already are closely related bugs about the same problem: Bug #2038818 and Bug #2046665

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Jammy:
root@j:~# rabbitmqctl list_feature_flags
Listing feature flags ...
name state
implicit_default_bindings enabled
maintenance_mode_status enabled
quorum_queue enabled
stream_queue enabled
user_limits enabled
virtual_host_metadata enabled

Noble (after upgrading from Jammy):
root@j-n-upgrade:~# rabbitmqctl list_feature_flags
Listing feature flags ...
name state
classic_mirrored_queue_version enabled
classic_queue_type_delivery_support enabled
direct_exchange_routing_v2 enabled
feature_flags_v2 enabled
implicit_default_bindings enabled
listener_records_in_ets enabled
maintenance_mode_status enabled
quorum_queue enabled
restart_streams enabled
stream_queue enabled
stream_sac_coordinator_unblock_group enabled
stream_single_active_consumer enabled
tracking_records_in_ets enabled
user_limits enabled
virtual_host_metadata enabled

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks for posting the other bugs, I tracked 2046665 to rabbitmq-server.

tags: added: server-triage-discuss
Changed in rabbitmq-server (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in rabbitmq-server (Ubuntu):
status: New → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
tags: added: server-todo
removed: server-triage-discuss
Changed in rabbitmq-server (Ubuntu Noble):
milestone: none → ubuntu-24.04.1
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Andreas,
I furthermore checked with "our" openstack team if that isn't something they've already hit, but they have no jammy -> noble upgrade path for their use of rabbitmq yet. Although they offered help to verify any fix we might propose by test upgrading their rabbitmq node using it.

This flag handling is indeed is a common practice, as for example in 3.11 needed all flags of 3.8.x just the same way [1].

The bonus problem that might make this harder, is that [2] states that 3.12 needs all flags of the 3.11.x series. But Jammy to Noble is 3.9.x -> 3.12 - so we might need an interim 3.11?!? That makes this quite complex unless e.g. a backport of the lastest 3.9.x makes that possible too the outcome will be quite complex.
Yet on the other hand, while some text says "before upgrade" and I do not see that in the OpenStack solution [4] Andreas found - to be fair that is so abstracted that I'm even yet unsure what/when it exactly does.

I must admit that I think we need a bit more time to get a better understanding of this [3] in general.
And even then we might still want to avoid too many assumptions and just ask upstream if there is any 3.9->3.12 way without an interim 3.11 - and discuss a more detailed plan of action from there.

Either way, big thanks to Hadmut to bring this to our attention!

[1]: https://www.rabbitmq.com/blog/2022/07/20/required-feature-flags-in-rabbitmq-3.11
[2]: https://github.com/rabbitmq/rabbitmq-server/discussions/8456
[3]: https://www.rabbitmq.com/docs/feature-flags
[4]: https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/919701

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.