OpenStack RabbitMQ Server Charm

Rabbitmq cluster member may fail to start after a cloud reboot

Bug #1915220 reported by Nikolay Vinogradov on 2021-02-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack RabbitMQ Server Charm	Triaged	Medium	Unassigned

Bug Description

Running Bionic/Ussuri OpenStack cloud with hardware offloading enabled. During reboot test, after all nodes were rebooted, one of the rabbitmq cluster members were failing to start reporting the following error:

BOOT FAILED
===========

Error description:
   {could_not_start,rabbit,
       {{function_clause,
            [{rabbit_exchange,callback,
                 [undefined,remove_bindings,transaction,
                  [undefined,
                   [{binding,
                        {resource,<<"nagios-rabbitmq-server-0">>,exchange,
                            <<"test_exchange">>},
                        <<"test_mq">>,
                        {resource,<<"nagios-rabbitmq-server-0">>,queue,
                            <<"test_exchange_queue">>},
                        []}]]],
                 [{file,"src/rabbit_exchange.erl"},{line,122}]},
             {rabbit_binding,x_callback,4,
                 [{file,"src/rabbit_binding.erl"},{line,568}]},
             {rabbit_binding,'-process_deletions/1-fun-0-',2,
                 [{file,"src/rabbit_binding.erl"},{line,550}]},
             {dict,map_bucket,2,[{file,"dict.erl"},{line,481}]},
             {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]},
             {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]},
             {dict,map_seg_list,2,[{file,"dict.erl"},{line,472}]},
             {dict,map_dict,2,[{file,"dict.erl"},{line,467}]}]},
        {rabbit,start,[normal,[]]}}}

Log files (may contain more information):
/<email address hidden>
/<email address hidden>

Error: {could_not_start,rabbit,
           {{function_clause,
                [{rabbit_exchange,callback,
                     [undefined,remove_bindings,transaction,
                      [undefined,
                       [{binding,
                            {resource,<<"nagios-rabbitmq-server-0">>,
                                exchange,<<"test_exchange">>},
                            <<"test_mq">>,
                            {resource,<<"nagios-rabbitmq-server-0">>,queue,
                                <<"test_exchange_queue">>},
                            []}]]],
                     [{file,"src/rabbit_exchange.erl"},{line,122}]},
                 {rabbit_binding,x_callback,4,
                     [{file,"src/rabbit_binding.erl"},{line,568}]},
                 {rabbit_binding,'-process_deletions/1-fun-0-',2,
                     [{file,"src/rabbit_binding.erl"},{line,550}]},
                 {dict,map_bucket,2,[{file,"dict.erl"},{line,481}]},
                 {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]},
                 {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]},
                 {dict,map_seg_list,2,[{file,"dict.erl"},{line,472}]},
                 {dict,map_dict,2,[{file,"dict.erl"},{line,467}]}]},
            {rabbit,start,[normal,[]]}}}

The cluster itself was operational. Looking deeper into RabbitMQ entities it turned out that the binding nagios-rabbitmq-server-0 existed but the corresponding queue was missing. As nrpe check the charm provides connects to the rabbitmq using localhost address, it wasn't able to reinitialize the queue.

I tried to rebuild the failing member mnesia db and readd the member back to the cluster, but it didn't help, most likely the problem was in the db itself. What helped was re-running Nagios NRPE check for the broken unit, from the good unit - it re-recreated the queue and the binding and the rabbimq-server-0 member started succesfully.

Tags:

Alex Kavanagh (ajkavanagh) on 2021-02-26

tags:	added: cold-start
Changed in charm-rabbitmq-server:
status:	New → Triaged
importance:	Undecided → Medium

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.