rabbitmq-server exited during amqp-relation-changed hook

Bug #1668708 reported by Francis Ginther
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Expired
Undecided
Unassigned

Bug Description

This was found in an automated landscape autopilot deployment [1]. Versions in use:

OPENSTACK_RELEASE=newton
Landscape=17.01~bzr10885+jenkins3455-2
JUJU=1:2.1~rc1-0ubuntu1~16.04.1~juju1
MAAS=2.1.3+bzr5573-0ubuntu1
rabbitmq-server charm: cs:~openstack-charmers-next/xenial/rabbitmq-server-251

The default charm config options are used except:
 - min-cluster-size: 3

The best I can tell, the rabbitmq-server process exited, which led to the charm unable to complete it's amqp-relation-changed hook. The bug here is the unexpected exit of rabbitmq-server. I'm not able to determine the cause, but perhaps someone with more detailed knowledge can look at the attached logs and figure it out. What I can see is the exit:

[from landscape-0-inner-logs/rabbitmq-server-0/var/log/syslog]
Feb 14 06:30:51 juju-2a5917-4-lxd-1 systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=143/n/a

and the charm hitting an error very shortly after:

[from landscape-0-inner-logs/rabbitmq-server-0/var/log/juju/unit-rabbitmq-server-0.log]
2017-02-14 06:30:54 INFO amqp-relation-changed Error: unable to connect to node 'rabbit@juju-2a5917-4-lxd-1': nodedown
2017-02-14 06:30:54 INFO amqp-relation-changed
2017-02-14 06:30:54 INFO amqp-relation-changed DIAGNOSTICS
2017-02-14 06:30:54 INFO amqp-relation-changed ===========
2017-02-14 06:30:54 INFO amqp-relation-changed
2017-02-14 06:30:54 INFO amqp-relation-changed attempted to contact: ['rabbit@juju-2a5917-4-lxd-1']
2017-02-14 06:30:54 INFO amqp-relation-changed
2017-02-14 06:30:54 INFO amqp-relation-changed rabbit@juju-2a5917-4-lxd-1:
2017-02-14 06:30:54 INFO amqp-relation-changed * unable to connect to epmd (port 4369) on juju-2a5917-4-lxd-1: address (cannot connect to host/port)
2017-02-14 06:30:54 INFO amqp-relation-changed
2017-02-14 06:30:54 INFO amqp-relation-changed
2017-02-14 06:30:54 INFO amqp-relation-changed current node details:
2017-02-14 06:30:54 INFO amqp-relation-changed - node name: nonode@nohost
2017-02-14 06:30:54 INFO amqp-relation-changed - home dir: /var/lib/rabbitmq
2017-02-14 06:30:54 INFO amqp-relation-changed - cookie hash: 92vmrXD/iLl+OKWxh4sEhQ==
2017-02-14 06:30:54 INFO amqp-relation-changed
2017-02-14 06:30:54 DEBUG juju-log amqp:44: Command '['/usr/sbin/rabbitmqctl', 'list_vhosts']' returned non-zero exit status 2

My expectation is that the rabbitmq-server process is still running. Again, I'm unable to determine why it exited. I don't see any errors in <email address hidden> or anything obvious is syslog that would indicate a killed process.

I'll attach all of the logs collected from the rabbitmq-server/0 unit.

[1] - https://ci.lscape.net/job/landscape-system-tests/5281/

Tags: landscape
Revision history for this message
Francis Ginther (fginther) wrote :

Logs from the rabbitmq-server/0 unit.

description: updated
Revision history for this message
James Page (james-page) wrote :

Sniffs like a DNS issue due to the odd nodenaming in the diagnostic output.

Revision history for this message
David Ames (thedac) wrote :
Download full text (3.8 KiB)

Systemd status 143 is received a SIGTERM. Nothing in the charm would have done that at that point.

Any chance this was OOM? That is my suspicion.

The syslog shows rabbit receiving SIGTERM and stopping and then failing when rabbitmqctl is run:

Feb 14 06:30:51 juju-2a5917-4-lxd-1 systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=143/n/a
Feb 14 06:30:52 juju-2a5917-4-lxd-1 systemd[1]: Started Session c612 of user rabbitmq.
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: Stopping and halting node 'rabbit@juju-2a5917-4-lxd-1' ...

Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: Error: unable to connect to node 'rabbit@juju-2a5917-4-lxd-1': nodedown
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: DIAGNOSTICS
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: ===========
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: attempted to contact: ['rabbit@juju-2a5917-4-lxd-1']
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: rabbit@juju-2a5917-4-lxd-1:
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: * connected to epmd (port 4369) on juju-2a5917-4-lxd-1
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: * epmd reports: node 'rabbit' not running at all
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: no other nodes on juju-2a5917-4-lxd-1
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: * suggestion: start the node
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: current node details:
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: - node name: 'rabbitmq-cli-52704@juju-2a5917-4-lxd-1'
Feb 14 06:30:53 juju-2a5917-4-lxd-1 rabbitmq[52696]: - home dir: . ...

Read more...

Changed in charm-rabbitmq-server:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack rabbitmq-server charm because there has been no activity for 60 days.]

Changed in charm-rabbitmq-server:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.