Rabbitmq fails to start on two nodes on HA IPv6 configuration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Haïkel Guémar |
Bug Description
Ha ipv6 configuration take a lot of time to finish. After the deployment is complete a nova list on the overcloud succeeds. However, pcs status shows rabbitmq has not started on two of the tree overcloud nodes.
Cluster name: tripleo_cluster
Last updated: Mon Sep 26 12:26:00 2016 Last change: Mon Sep 26 09:11:16 2016 by root via cibadmin on overcloud-
Stack: corosync
Current DC: overcloud-
3 nodes and 19 resources configured
Online: [ overcloud-
Full list of resources:
ip-fd00.
Clone Set: haproxy-clone [haproxy]
Started: [ overcloud-
ip-192.0.2.8 (ocf::heartbeat
Master/Slave Set: galera-master [galera]
Masters: [ overcloud-
ip-2001.
ip-fd00.
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ overcloud-
Stopped: [ overcloud-
Master/Slave Set: redis-master [redis]
Masters: [ overcloud-
Slaves: [ overcloud-
ip-fd00.
ip-fd00.
openstack-
Failed Actions:
* rabbitmq_start_0 on overcloud-
last-
* rabbitmq_start_0 on overcloud-
last-
log in /var/log/rabbitmq show a lot of crashes with this report
=CRASH REPORT==== 26-Sep-
crasher:
initial call: rabbit_
pid: <0.1570.0>
registered_
exception exit: {aborted,
in function mnesia:abort/1 (mnesia.erl, line 313)
in call from rabbit_
in call from rabbit_
in call from rabbit_
in call from rabbit_
in call from rabbit_
in call from rabbit_
in call from rabbit_reader:run/1 (src/rabbit_
ancestors: [<0.1568.
messages: [{'EXIT'
links: [<0.1568.0>]
dictionary: [{process_name,
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 1613
neighbours:
journalctl on one of the failing nodes show
Sep 26 12:56:19 overcloud-
look at http://
Changed in tripleo: | |
status: | New → Confirmed |
description: | updated |
Changed in tripleo: | |
milestone: | none → newton-rc2 |
importance: | Undecided → Critical |
Changed in tripleo: | |
status: | Confirmed → Triaged |
Changed in tripleo: | |
assignee: | nobody → Haïkel Guémar (hguemar) |
Changed in tripleo: | |
status: | Triaged → Fix Released |
Latest findings:
pacemaker reports rabbitmq has failed, but appears to be up and running, altought there are a lot of crash reports on the logs.
ocf::heartbeat: rabbitmq- cluster is implemented in /usr/lib/ ocf/resource. d/heartbeat/ rabbitmq- cluster which is owned by resource-agents package. But the package doesn't appear to be installed