2023.2 - RabbitMQ Dual Stack System Crashing - Why?

Bug #2058295 reported by Noel Ashford
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

(rabbitmq)[root@tunninet-server-noel /]# rabbitmqctl --node rabbit status
Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [%{name: :"rabbitmqcli-59-rabbit@tunninet-server-noel", supervisor: :net_sup_dynamic, net_tickintensity: 4, net_ticktime: 60, name_domain: :shortnames, clean_halt: false}]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}
(rabbitmq)[root@tunninet-server-noel /]# epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 25672
(rabbitmq)[root@tunninet-server-noel /]# exit
exit
root@tunninet-server-noel:~# cat /etc/kolla/rabbitmq/rabbitmq-env.conf
RABBITMQ_NODENAME=rabbit@tunninet-server-noel
RABBITMQ_LOG_BASE=/var/log/kolla/rabbitmq
RABBITMQ_DIST_PORT=25672
RABBITMQ_PID_FILE=/var/lib/rabbitmq/mnesia/rabbitmq.pid
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-kernel inetrc '/etc/rabbitmq/erl_inetrc' +S 2:2 +sbwt none +sbwtdcpu none +sbwtdio none"
RABBITMQ_CTL_ERL_ARGS=""

export ERL_EPMD_ADDRESS=192.168.5.1
export ERL_EPMD_PORT=4369
export ERL_INETRC=/etc/rabbitmq/erl_inetrc
root@tunninet-server-noel:~# sudo ss -tlnp | grep 4369
LISTEN 0 4096 192.168.5.1:4369 0.0.0.0:* users:(("epmd",pid=54136,fd=5))
LISTEN 0 4096 127.0.0.1:4369 0.0.0.0:* users:(("epmd",pid=54136,fd=3))
LISTEN 0 4096 [::1]:4369 [::]:* users:(("epmd",pid=54136,fd=4))

Revision history for this message
Noel Ashford (nashford77) wrote :

Have also tried w exact node name, localhost, you name it. No go.

(rabbitmq)[root@tunninet-server-noel /]# rabbitmqctl --node rabbit@tunninet-server-noel status
Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [%{name: :"rabbitmqcli-999-rabbit@tunninet-server-noel", supervisor: :net_sup_dynamic, net_tickintensity: 4, net_ticktime: 60, name_domain: :shortnames, clean_halt: false}]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}

Revision history for this message
Noel Ashford (nashford77) wrote :

18:59:04.138 [error] rabbit_env: Failed to setup distribution (as rabbit_ctl_50@tunninet-server-noel) to query node rabbit@tunninet-server-noel: {:error,
 {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}},
  {:child, :undefined, :net_sup_dynamic,
   {:erl_distribution, :start_link,
    [
      %{
        name: :"rabbit_ctl_50@tunninet-server-noel",
        supervisor: :net_sup_dynamic,
        net_tickintensity: 4,
        net_ticktime: 60,
        name_domain: :shortnames,
        clean_halt: false
      }
    ]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}}

Revision history for this message
Noel Ashford (nashford77) wrote :

I see the cookie is there... ?

(rabbitmq)[root@tunninet-server-noel /]# ls -la /var/lib/rabbitmq/.erlang.cookie
-r-------- 1 rabbitmq rabbitmq 41 Mar 18 16:59 /var/lib/rabbitmq/.erlang.cookie

# System Entries
127.0.0.1 localhost localhost localhost.localdomain localhost6 localhost6.localdomain6

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback localhost localhost.localdomain localhost6 localhost6.localdomain6
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

(rabbitmq)[root@tunninet-server-noel /]# grep tunninet-server-noel /etc/hosts
192.168.5.1 tunninet-server-noel

Error logs:

2024-03-18 20:11:16.413684-04:00 [error] <0.2133.0> Error on AMQP connection <0.2133.0> (192.168.5.1:41456 -> 192.168.5.1:5671 - neutron-server:742:bf3124b5-10ad-4260-b354-02647275c66d, vhost: '/', user: 'openstack', state: running), channel 0:
2024-03-18 20:11:16.413684-04:00 [error] <0.2133.0> operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

Revision history for this message
Noel Ashford (nashford77) wrote :

(rabbitmq)[root@tunninet-server-noel /]# rabbitmqctl -n "rabbit@$(hostname -s)" status
Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [%{name: :"rabbitmqcli-489-rabbit@tunninet-server-noel", supervisor: :net_sup_dynamic, net_tickintensity: 4, net_ticktime: 60, name_domain: :shortnames, clean_halt: false}]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}
(rabbitmq)[root@tunninet-server-noel /]# rabbitmqctl -n rabbit@localhost status
Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [%{name: :"rabbitmqcli-901-rabbit@tunninet-server-noel", supervisor: :net_sup_dynamic, net_tickintensity: 4, net_ticktime: 60, name_domain: :shortnames, clean_halt: false}]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}
(rabbitmq)[root@tunninet-server-noel /]# ping -c2 localhost
PING localhost(ip6-localhost (::1)) 56 data bytes
64 bytes from ip6-localhost (::1): icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from ip6-localhost (::1): icmp_seq=2 ttl=64 time=0.037 ms

--- localhost ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1029ms
rtt min/avg/max/mdev = 0.037/0.043/0.049/0.006 ms
(rabbitmq)[root@tunninet-server-noel /]# ping -c2 "$(hostname -s)"
PING tunninet-server-noel.ny5.lan.tunninet.com (192.168.5.1) 56(84) bytes of data.
64 bytes from tunninet-server-noel.ny5.lan.tunninet.com (192.168.5.1): icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from tunninet-server-noel.ny5.lan.tunninet.com (192.168.5.1): icmp_seq=2 ttl=64 time=0.054 ms

--- tunninet-server-noel.ny5.lan.tunninet.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.037/0.045/0.054/0.008 ms
(rabbitmq)[root@tunninet-server-noel /]# ping -c2 "$(hostname)"
PING tunninet-server-noel.ny5.lan.tunninet.com (192.168.5.1) 56(84) bytes of data.
64 bytes from tunninet-server-noel.ny5.lan.tunninet.com (192.168.5.1): icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from tunninet-server-noel.ny5.lan.tunninet.com (192.168.5.1): icmp_seq=2 ttl=64 time=0.031 ms

--- tunninet-server-noel.ny5.lan.tunninet.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1015ms
rtt min/avg/max/mdev = 0.031/0.034/0.038/0.003 ms

Revision history for this message
Noel Ashford (nashford77) wrote :

(rabbitmq)[root@tunninet-server-noel /]# erl -A0 -noinput -boot start_clean -eval 'net_kernel:start([list_to_atom("rabbit-gethostname-" ++ os:getpid()), shortnames]), [_, H] = string:tokens(atom_to_list(node()), "@"), io:format("~s~n", [H]), init:stop().'
=INFO REPORT==== 18-Mar-2024::20:44:18.382717 ===
Protocol 'inet_tcp': register/listen error: epmd_close

nohost
=CRASH REPORT==== 18-Mar-2024::20:44:18.382860 ===
  crasher:
    initial call: net_kernel:init/1
    pid: <0.86.0>
    registered_name: []
    exception exit: {error,badarg}
      in function gen_server:init_it/6 (gen_server.erl, line 961)
    ancestors: [net_sup,kernel_sup,<0.47.0>]
    message_queue_len: 0
    messages: []
    links: [<0.83.0>,#Port<0.4>]
    dictionary: [{longnames,false}]
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 28
    reductions: 3229
  neighbours:

=SUPERVISOR REPORT==== 18-Mar-2024::20:44:18.384880 ===
    supervisor: {local,net_sup}
    errorContext: start_error
    reason: {'EXIT',nodistribution}
    offender: [{pid,undefined},
               {id,net_kernel},
               {mfargs,{net_kernel,start_link,
                                   [#{name => 'rabbit-gethostname-3390',
                                      supervisor => net_sup_dynamic,
                                      net_tickintensity => 4,
                                      net_ticktime => 60,
                                      name_domain => shortnames,
                                      clean_halt => false}]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,2000},
               {child_type,worker}]

Revision history for this message
Noel Ashford (nashford77) wrote :

Original error for kolla...

TASK [rabbitmq : Waiting for rabbitmq to start] ********************************************************************************************************************************************************************************************************************************************************
fatal: [tunninet-server-noel]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "rabbitmq", "rabbitmqctl", "wait", "/var/lib/rabbitmq/mnesia/rabbitmq.pid"], "delta": "0:00:00.709556", "end": "2024-03-18 20:22:33.454444", "msg": "non-zero return code", "rc": 78, "start": "2024-03-18 20:22:32.744888", "stderr": "Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [%{name: :\"rabbitmqcli-721-rabbit@tunninet-server-noel\", supervisor: :net_sup_dynamic, net_tickintensity: 4, net_ticktime: 60, name_domain: :shortnames, clean_halt: false}]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}", "stderr_lines": ["Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [%{name: :\"rabbitmqcli-721-rabbit@tunninet-server-noel\", supervisor: :net_sup_dynamic, net_tickintensity: 4, net_ticktime: 60, name_domain: :shortnames, clean_halt: false}]}, :permanent, false, 1000, :supervisor, [:erl_distribution]}}"], "stdout": "", "stdout_lines": []}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.