'/usr/sbin/rabbitmqctl wait' results in 'Error: process_not_running' during amqp-relation-changed

Bug #1670439 reported by Francis Ginther
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
rabbitmq-server (Juju Charms Collection)
New
Undecided
Unassigned

Bug Description

This was found during CI testing of Landscape Openstack Autopilot [1].

It looks like rabbitmq restarts unexpectedly while the charm is checking that it is running. There are multiple successful cases of the 'wait' check earlier in this log, with the same pid of 12237. However, I can not find any evidence of rabbitmq restarting in any of the log files (syslog or the rabbitmq/*log files).

The result of this failure is an error from the amqp-relation-changed hook and the autopilot deployment fails.

[from landscape-0-inner-logs/rabbitmq-server-1/var/log/juju/unit-rabbitmq-server-1.log]
2017-03-06 07:38:25 DEBUG juju-log amqp:41: Checking for minimum of 3 peer units
2017-03-06 07:38:25 INFO juju-log amqp:41: Sufficient number of peer units to form cluster 3
2017-03-06 07:38:26 DEBUG juju-log amqp:41: Waiting for rabbitmq app to start: /<email address hidden>
2017-03-06 07:38:26 DEBUG juju-log amqp:41: Running ['timeout', '180', '/usr/sbin/rabbitmqctl', 'wait', '/<email address hidden>']
2017-03-06 07:38:26 INFO amqp-relation-changed Waiting for 'rabbit@juju-1623f7-3-lxd-2' ...
2017-03-06 07:38:26 INFO amqp-relation-changed pid is 12237 ...
2017-03-06 07:38:26 INFO amqp-relation-changed Error: process_not_running
2017-03-06 07:38:27 DEBUG juju-log amqp:41: Status of node 'rabbit@juju-1623f7-3-lxd-2' ...
[{pid,12414},
 {running_applications,[{rabbit,"RabbitMQ","3.5.7"},
                        {os_mon,"CPO CXC 138 46","2.4"},
                        {mnesia,"MNESIA CXC 138 12","4.13.3"},
                        {xmerl,"XML parser","1.3.10"},
                        {sasl,"SASL CXC 138 11","2.7"},
                        {stdlib,"ERTS CXC 138 10","2.8"},
                        {kernel,"ERTS CXC 138 10","4.2"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"},
 {memory,[{total,98368184},
          {connection_readers,493560},
          {connection_writers,105064},
          {connection_channels,532336},
          {connection_other,1311024},
          {queue_procs,18017136},
          {queue_slave_procs,7422928},
          {plugins,0},
          {other_proc,14345464},
          {mnesia,4006664},
          {mgmt_db,0},
          {msg_index,801496},
          {other_ets,2754760},
          {binary,18946320},
          {code,16972695},
          {atom,654217},
          {other_system,12004520}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,6712999936},
 {disk_free_limit,50000000},
 {disk_free,85393629184},
 {file_descriptors,[{total_limit,65436},
                    {total_used,47},
                    {sockets_limit,58890},
                    {sockets_used,45}]},
 {processes,[{limit,1048576},{used,4120}]},
 {run_queue,0},
 {uptime,2512}]

2017-03-06 07:38:27 INFO amqp-relation-changed Traceback (most recent call last):
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/amqp-relation-changed", line 677, in <module>
2017-03-06 07:38:27 INFO amqp-relation-changed rabbit.assess_status(rabbit.ConfigRenderer(rabbit.CONFIG_FILES))
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/rabbit_utils.py", line 820, in assess_status
2017-03-06 07:38:27 INFO amqp-relation-changed assess_status_func(configs)()
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/rabbit_utils.py", line 844, in _assess_status_func
2017-03-06 07:38:27 INFO amqp-relation-changed services=services(), ports=None)
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1225, in _determine_os_workload_status
2017-03-06 07:38:27 INFO amqp-relation-changed state, message, lambda: charm_func(configs))
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1353, in _ows_check_charm_func
2017-03-06 07:38:27 INFO amqp-relation-changed charm_state, charm_message = charm_func_with_configs()
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1225, in <lambda>
2017-03-06 07:38:27 INFO amqp-relation-changed state, message, lambda: charm_func(configs))
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/rabbit_utils.py", line 755, in assess_cluster_status
2017-03-06 07:38:27 INFO amqp-relation-changed ret = wait_app()
2017-03-06 07:38:27 INFO amqp-relation-changed File "/var/lib/juju/agents/unit-rabbitmq-server-1/charm/hooks/rabbit_utils.py", line 369, in wait_app
2017-03-06 07:38:27 INFO amqp-relation-changed raise ex
2017-03-06 07:38:27 INFO amqp-relation-changed subprocess.CalledProcessError: Command '['timeout', '180', '/usr/sbin/rabbitmqctl', 'wait', '/<email address hidden>']' returned non-zero exit status 2
2017-03-06 07:38:27 ERROR juju.worker.uniter.operation runhook.go:107 hook "amqp-relation-changed" failed: exit status 1

[1] - https://ci.lscape.net/job/landscape-system-tests/5448
(Note this is not accessible outside of the landscape team, it is included here for our reference)

Tags: landscape
Revision history for this message
Francis Ginther (fginther) wrote :

Attaching log files from all of the rabbitmq-server units from the CI run [1] and base-machine-2 (which contains var/log/ps-fauxww.txt and var/log/ps_mem.txt with processes listings and memory usage).

[1] - https://ci.lscape.net/job/landscape-system-tests/5448
(Note this is not accessible outside of the landscape team, it is included here for reference)

description: updated
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Apologies if I've missed this in the info already provided, but I'm not able to see what charm revision was deployed, and what config options were set on the Juju application. Can you please also attach that? Thank you.

Changed in rabbitmq-server (Juju Charms Collection):
status: New → Incomplete
Revision history for this message
Francis Ginther (fginther) wrote :

My apologies... I filed this and forgot to add the charm details:

rabbitmq-server charm: cs:xenial/rabbitmq-server-61

Charm configuration (non defaults):
rabbitmq-server
 - min-cluster-size: 3

Changed in rabbitmq-server (Juju Charms Collection):
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.