rabbitmq fails to install with OVN enabled networking in LXD

Bug #2080895 reported by Jeff Hillman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Triaged
High
Unassigned

Bug Description

juju 3.5 / jammy

rabbitmq-server charm channel 3.9/stable rev 188

When attempting to deploy rabbitmq in a microcloud environment (microceph, microovn, lxd), with OVN enabled on a juju profile enabling the OVN network, the rabbitmq server charm sits at "(install) Installing/upgrading RabbitMQ packages" indefinitely.

inspecting into the machine, it is failing to install the rabbitmq-server debian package. The package eventually fails during a reconfigure with the following output:

```
BOOT FAILED
===========
Exception during startup:

error:{badmatch,{error,timeout}}

    rabbit_prelaunch_dist:dist_port_use_check_fail/2, line 127
    rabbit_prelaunch_dist:setup/1, line 22
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0>
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> BOOT FAILED
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> ===========
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> Exception during startup:
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0>
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> error:{badmatch,{error,timeout}}
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0>
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> rabbit_prelaunch_dist:dist_port_use_check_fail/2, line 127
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> rabbit_prelaunch_dist:setup/1, line 22
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> rabbit_prelaunch:do_run/0, line 115
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> rabbit_prelaunch:run_prelaunch_first_phase/0, line 32
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> supervisor:do_start_child_i/3, line 414
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> supervisor:do_start_child/2, line 400
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> supervisor:-start_children/2-fun-0-/3, line 384
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0> supervisor:children_map/4, line 1250
2024-09-16 19:46:48.733741+00:00 [erro] <0.130.0>
    rabbit_prelaunch:do_run/0, line 115
    rabbit_prelaunch:run_prelaunch_first_phase/0, line 32
    supervisor:do_start_child_i/3, line 414
    supervisor:do_start_child/2, line 400
    supervisor:-start_children/2-fun-0-/3, line 384
    supervisor:children_map/4, line 1250

2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> supervisor: {local,rabbit_prelaunch_sup}
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> errorContext: start_error
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> reason: {badmatch,{error,timeout}}
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> offender: [{pid,undefined},
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> {id,prelaunch},
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> {mfargs,{rabbit_prelaunch,run_prelaunch_first_phase,[]}},
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> {restart_type,transient},
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> {significant,false},
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> {shutdown,5000},
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0> {child_type,worker}]
2024-09-16 19:46:49.737111+00:00 [erro] <0.130.0>
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> crasher:
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> initial call: application_master:init/4
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> pid: <0.128.0>
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> registered_name: []
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> exception exit: {{shutdown,
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> {failed_to_start_child,prelaunch,
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> {badmatch,{error,timeout}}}},
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> {rabbit_prelaunch_app,start,[normal,[]]}}
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> in function application_master:init/4 (application_master.erl, line 142)
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> ancestors: [<0.127.0>]
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> message_queue_len: 1
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> messages: [{'EXIT',<0.129.0>,normal}]
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> links: [<0.127.0>,<0.44.0>]
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> dictionary: []
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> trap_exit: true
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> status: running
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> heap_size: 376
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> stack_size: 29
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> reductions: 168
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0> neighbours:
2024-09-16 19:46:49.737314+00:00 [erro] <0.128.0>
2024-09-16 19:46:49.741671+00:00 [noti] <0.44.0> Application rabbitmq_prelaunch exited with reason: {{shutdown,{failed_to_start_child,prelaunch,{badmatch,{error,timeout}}}},{rabbit_prelaunch_app,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{badmatch,{error,timeout}}}},{rabbit_prelaunch_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{badmatch,{error,timeout}}}},{rabbit_prelaunch_app,start,[normal,[]]}}})

Crash dump is being written to: erl_crash.dump...done
Job for rabbitmq-server.service failed because the control process exited with error code.
See "systemctl status rabbitmq-server.service" and "journalctl -xeu rabbitmq-server.service" for details.
invoke-rc.d: initscript rabbitmq-server, action "start" failed.
● rabbitmq-server.service - RabbitMQ Messaging Server
     Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Mon 2024-09-16 19:47:07 UTC; 5ms ago
    Process: 7331 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=1/FAILURE)
   Main PID: 7331 (code=exited, status=1/FAILURE)
     Status: "Standing by"
        CPU: 523ms
dpkg: error processing package rabbitmq-server (--configure):
 installed rabbitmq-server package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 rabbitmq-server
```

It should also be noted that /etc/hosts is never configured with the IP / container name

This happens in both an LXC container as well as an LXC Virtual machine.

If the same scenario is applied to a profile/model with local bridge networking enabled, the charm installs as expected. This is true for both containers as well as Virtual machine, bridging does not exhibit this behavior.

Myles Penner (mylesjp)
Changed in charm-rabbitmq-server:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Billy Olsen (billy-olsen) wrote :

This feels more like an issue in the rabbitmq-server debian package rather than the charm itself. The charm is simply performing the apt install command and its failing there.

Now from that, I'm wondering if the rabbitmq service is failing to start in a reasonable time - but completes after the fact. The following snippet in the trace suggests that the rmq port is already in use:

rabbit_prelaunch_dist:dist_port_use_check_fail/2, line 127

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.