update-status hook fails on rabbitmqctl cluster_status check 'cannot fork'
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
rabbitmq-server (Juju Charms Collection) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
After a period of time, I am seeing all units of rabbitmq in my environment fail on running the update-status hook.
When using 'juju resolved', the hook continues to fail every time in the juju agent log. When I run the update-status hook with debug-hooks, it works fine. Also when I run 'rabbitmq-ctl cluster_status' manually on the node, it executes fine.
After restarting the juju agent and resolving the hook, I see the same error
I'm wondering if some process limit of systemd or otherwise is getting hit when the hook is run by the juju agent
using juju 2.0 rc3, ubuntu 16.04
$ juju status rabbitmq
MODEL CONTROLLER CLOUD/REGION VERSION
default ash-juju-01 ash-maas-01 2.0-rc3.1
APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
rabbitmq error 3 rabbitmq-server jujucharms 6 ubuntu
UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
rabbitmq/0 error idle 0/lxd/7 10.110.201.3 5672/tcp hook failed: "update-status"
rabbitmq/1* error idle 1/lxd/7 10.110.201.18 5672/tcp hook failed: "update-status"
rabbitmq/2 error idle 2/lxd/8 10.110.201.14 5672/tcp hook failed: "update-status"
rabbitmq/0 unit log below:
2016-10-10 17:43:18 DEBUG juju.worker.
2016-10-10 17:43:18 INFO juju.worker.uniter resolver.go:100 awaiting error resolution for "update-status" hook
2016-10-10 17:43:18 DEBUG juju.worker.
2016-10-10 17:43:18 DEBUG juju.worker.uniter uniter.go:540 acquire lock "machine-lock" for uniter hook execution
2016-10-10 17:43:18 DEBUG juju.worker.uniter uniter.go:545 lock "machine-lock" acquired
2016-10-10 17:43:18 DEBUG juju.worker.
2016-10-10 17:43:18 DEBUG juju.worker.
2016-10-10 17:43:18 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] executing: running update-status hook
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'install'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'upgrade_charm'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'config_changed'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:18 DEBUG juju-log Hardening function 'update_status'
2016-10-10 17:43:18 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG juju-log No hardening applied to 'update_status'
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 INFO juju-log Updating status.
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:19 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:20 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 DEBUG worker.uniter.jujuc server.go:178 hook context id "rabbitmq/
2016-10-10 17:43:21 INFO update-status /usr/lib/
2016-10-10 17:43:21 INFO update-status /usr/lib/
2016-10-10 17:43:21 INFO update-status Traceback (most recent call last):
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status rabbit.
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status assess_
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status services=
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status state, message, lambda: charm_func(
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status charm_state, charm_message = charm_func_
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status state, message, lambda: charm_func(
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status if not clustered():
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status res = func(*args, **kwargs)
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status if len(running_
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status res = func(*args, **kwargs)
2016-10-10 17:43:21 INFO update-status File "/var/lib/
2016-10-10 17:43:21 INFO update-status out = subprocess.
2016-10-10 17:43:21 INFO update-status File "/usr/lib/
2016-10-10 17:43:21 INFO update-status raise CalledProcessEr
2016-10-10 17:43:21 INFO update-status subprocess.
2016-10-10 17:43:21 ERROR juju.worker.
2016-10-10 17:43:21 DEBUG juju.worker.
2016-10-10 17:43:21 INFO juju.worker.uniter resolver.go:100 awaiting error resolution for "update-status" hook
2016-10-10 17:43:21 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] error: hook failed: "update-status"
2016-10-10 17:43:23 DEBUG juju.worker.
2016-10-10 17:43:23 INFO juju.worker.uniter resolver.go:100 awaiting error resolution for "update-status" hook
'cannot fork' is normally a symptom of the machine running out of memory - the OS is basically stating 'no way' when the new process creation is attempted resulting in the fork failing in this way.
The 2.0 rc's did have some memory leaks so its possible that this has already been resolved, 2.1 is improving the situation further (as 2.0.2 also has some residual leaking).
I'm going to mark this as 'Invalid' as I don't think its a charm specific issue - either the machine is under specified or something is consuming all of the memory over time.
If you're still seeing this problem, please open a task on this bug report against the juju project.
Worth nothing that RMQ memory usage will also grow based on a) the number of clients connection and b) the number of instances on the cloud - the regular syncs between hypervisor and controller are done with a good level of parallism, so the more instances you have, the more connections created on the RMQ message bus.