update-status hook fails on rabbitmqctl cluster_status check 'cannot fork'

Bug #1632043 reported by Matt Rae
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
rabbitmq-server (Juju Charms Collection)
Invalid
Undecided
Unassigned

Bug Description

After a period of time, I am seeing all units of rabbitmq in my environment fail on running the update-status hook.

When using 'juju resolved', the hook continues to fail every time in the juju agent log. When I run the update-status hook with debug-hooks, it works fine. Also when I run 'rabbitmq-ctl cluster_status' manually on the node, it executes fine.

After restarting the juju agent and resolving the hook, I see the same error

I'm wondering if some process limit of systemd or otherwise is getting hit when the hook is run by the juju agent

using juju 2.0 rc3, ubuntu 16.04

$ juju status rabbitmq
MODEL CONTROLLER CLOUD/REGION VERSION
default ash-juju-01 ash-maas-01 2.0-rc3.1

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
rabbitmq error 3 rabbitmq-server jujucharms 6 ubuntu

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
rabbitmq/0 error idle 0/lxd/7 10.110.201.3 5672/tcp hook failed: "update-status"
rabbitmq/1* error idle 1/lxd/7 10.110.201.18 5672/tcp hook failed: "update-status"
rabbitmq/2 error idle 2/lxd/8 10.110.201.14 5672/tcp hook failed: "update-status"

rabbitmq/0 unit log below:

2016-10-10 17:43:18 DEBUG juju.worker.uniter.remotestate watcher.go:308 got unit change
2016-10-10 17:43:18 INFO juju.worker.uniter resolver.go:100 awaiting error resolution for "update-status" hook
2016-10-10 17:43:18 DEBUG juju.worker.uniter.operation executor.go:69 running operation run update-status hook
2016-10-10 17:43:18 DEBUG juju.worker.uniter uniter.go:540 acquire lock "machine-lock" for uniter hook execution
2016-10-10 17:43:18 DEBUG juju.worker.uniter uniter.go:545 lock "machine-lock" acquired
2016-10-10 17:43:18 DEBUG juju.worker.uniter.operation executor.go:100 preparing operation "run update-status hook"
2016-10-10 17:43:18 DEBUG juju.worker.uniter.operation executor.go:100 executing operation "run update-status hook"
2016-10-10 17:43:18 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] executing: running update-status hook
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'install'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'upgrade_charm'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'config_changed'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'update_status'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG juju-log No hardening applied to 'update_status'
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 INFO juju-log Updating status.
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/0-update-status-7199682028079804052"; dir "/var/lib/juju/agents/unit-rabbitmq-0/charm"
2016-10-10 17:43:21 INFO update-status /usr/lib/rabbitmq/bin/rabbitmqctl: 93: /usr/lib/rabbitmq/bin/rabbitmq-env: Cannot fork
2016-10-10 17:43:21 INFO update-status /usr/lib/rabbitmq/bin/rabbitmqctl: 106: /usr/lib/rabbitmq/bin/rabbitmq-env: Cannot fork
2016-10-10 17:43:21 INFO update-status Traceback (most recent call last):
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/update-status", line 768, in <module>
2016-10-10 17:43:21 INFO update-status rabbit.assess_status(rabbit.ConfigRenderer(rabbit.CONFIG_FILES))
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/rabbit_utils.py", line 788, in assess_status
2016-10-10 17:43:21 INFO update-status assess_status_func(configs)()
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/rabbit_utils.py", line 809, in _assess_status_func
2016-10-10 17:43:21 INFO update-status services=services(), ports=None)
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1171, in _determine_os_workload_status
2016-10-10 17:43:21 INFO update-status state, message, lambda: charm_func(configs))
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1299, in _ows_check_charm_func
2016-10-10 17:43:21 INFO update-status charm_state, charm_message = charm_func_with_configs()
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1171, in <lambda>
2016-10-10 17:43:21 INFO update-status state, message, lambda: charm_func(configs))
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/rabbit_utils.py", line 720, in assess_cluster_status
2016-10-10 17:43:21 INFO update-status if not clustered():
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/charmhelpers/core/hookenv.py", line 72, in wrapper
2016-10-10 17:43:21 INFO update-status res = func(*args, **kwargs)
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/rabbit_utils.py", line 706, in clustered
2016-10-10 17:43:21 INFO update-status if len(running_nodes()) > 1:
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/charmhelpers/core/hookenv.py", line 72, in wrapper
2016-10-10 17:43:21 INFO update-status res = func(*args, **kwargs)
2016-10-10 17:43:21 INFO update-status File "/var/lib/juju/agents/unit-rabbitmq-0/charm/hooks/rabbit_utils.py", line 656, in running_nodes
2016-10-10 17:43:21 INFO update-status out = subprocess.check_output([RABBITMQ_CTL, 'cluster_status'])
2016-10-10 17:43:21 INFO update-status File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
2016-10-10 17:43:21 INFO update-status raise CalledProcessError(retcode, cmd, output=output)
2016-10-10 17:43:21 INFO update-status subprocess.CalledProcessError: Command '['/usr/sbin/rabbitmqctl', 'cluster_status']' returned non-zero exit status 2
2016-10-10 17:43:21 ERROR juju.worker.uniter.operation runhook.go:107 hook "update-status" failed: exit status 1
2016-10-10 17:43:21 DEBUG juju.worker.uniter.operation executor.go:84 lock released
2016-10-10 17:43:21 INFO juju.worker.uniter resolver.go:100 awaiting error resolution for "update-status" hook
2016-10-10 17:43:21 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] error: hook failed: "update-status"
2016-10-10 17:43:23 DEBUG juju.worker.uniter.remotestate watcher.go:308 got unit change
2016-10-10 17:43:23 INFO juju.worker.uniter resolver.go:100 awaiting error resolution for "update-status" hook

Tags: cpec
Matt Rae (mattrae)
tags: added: cpec
Revision history for this message
James Page (james-page) wrote :

'cannot fork' is normally a symptom of the machine running out of memory - the OS is basically stating 'no way' when the new process creation is attempted resulting in the fork failing in this way.

The 2.0 rc's did have some memory leaks so its possible that this has already been resolved, 2.1 is improving the situation further (as 2.0.2 also has some residual leaking).

I'm going to mark this as 'Invalid' as I don't think its a charm specific issue - either the machine is under specified or something is consuming all of the memory over time.

If you're still seeing this problem, please open a task on this bug report against the juju project.

Worth nothing that RMQ memory usage will also grow based on a) the number of clients connection and b) the number of instances on the cloud - the regular syncs between hypervisor and controller are done with a good level of parallism, so the more instances you have, the more connections created on the RMQ message bus.

Changed in rabbitmq-server (Juju Charms Collection):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.