Rabbit server runs out of memory

Bug #1863602 reported by Liam Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Charm Test Infra
Fix Released
High
Liam Young

Bug Description

It looks like the tests (stable_to_next_ha mainly) does not allocate machines for the rabbit servers with enough RAM. This mainly shows itself as a rabbit charm update status charm hook failing.

From a recent mojo run of stable to next ha on xenial:

/var/log/juju/unit-rabbitmq-server-2.log

2020-02-13 16:52:25 DEBUG update-status Traceback (most recent call last):
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/update-status", line 984, in <module>
2020-02-13 16:52:25 DEBUG update-status rabbit.assess_status(rabbit.ConfigRenderer(rabbit.CONFIG_FILES))
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 920, in assess_status
2020-02-13 16:52:25 DEBUG update-status assess_status_func(configs)()
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 944, in _assess_status_func
2020-02-13 16:52:25 DEBUG update-status services=services(), ports=None)
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/contrib/openstack/utils.py", line 888, in _determine_os_workload_status
2020-02-13 16:52:25 DEBUG update-status state, message, lambda: charm_func(configs))
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/contrib/openstack/utils.py", line 1035, in _ows_check_charm_func
2020-02-13 16:52:25 DEBUG update-status charm_state, charm_message = charm_func_with_configs()
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/contrib/openstack/utils.py", line 888, in <lambda>
2020-02-13 16:52:25 DEBUG update-status state, message, lambda: charm_func(configs))
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 854, in assess_cluster_status
2020-02-13 16:52:25 DEBUG update-status if not clustered():
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/core/hookenv.py", line 84, in wrapper
2020-02-13 16:52:25 DEBUG update-status res = func(*args, **kwargs)
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 834, in clustered
2020-02-13 16:52:25 DEBUG update-status if len(running_nodes()) > 1:
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/core/hookenv.py", line 84, in wrapper
2020-02-13 16:52:25 DEBUG update-status res = func(*args, **kwargs)
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 790, in running_nodes
2020-02-13 16:52:25 DEBUG update-status return nodes(get_running=True)
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/core/hookenv.py", line 84, in wrapper
2020-02-13 16:52:25 DEBUG update-status res = func(*args, **kwargs)
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 773, in nodes
2020-02-13 16:52:25 DEBUG update-status out = rabbitmqctl_normalized_output('cluster_status')
2020-02-13 16:52:25 DEBUG update-status File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 401, in rabbitmqctl_normalized_output
2020-02-13 16:52:25 DEBUG update-status .check_output(cmd, stderr=subprocess.STDOUT)
2020-02-13 16:52:25 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2020-02-13 16:52:25 DEBUG update-status **kwargs).stdout
2020-02-13 16:52:25 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 708, in run
2020-02-13 16:52:25 DEBUG update-status output=stdout, stderr=stderr)
2020-02-13 16:52:25 DEBUG update-status subprocess.CalledProcessError: Command '['/usr/sbin/rabbitmqctl', 'cluster_status']' returned non-zero exit status 2
2020-02-13 16:52:25 ERROR juju.worker.uniter.operation runhook.go:132 hook "update-status" failed: exit status 1

syslog:

Feb 13 16:47:20 juju-996685-auto-osci-sv07-39 systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: Stopping and halting node 'rabbit@juju-996685-auto-osci-sv07-39' ...
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: Error: unable to connect to node 'rabbit@juju-996685-auto-osci-sv07-39': nodedown
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: DIAGNOSTICS
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: ===========
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: attempted to contact: ['rabbit@juju-996685-auto-osci-sv07-39']
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: rabbit@juju-996685-auto-osci-sv07-39:
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: * connected to epmd (port 4369) on juju-996685-auto-osci-sv07-39
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: * epmd reports: node 'rabbit' not running at all
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: no other nodes on juju-996685-auto-osci-sv07-39
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: * suggestion: start the node
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: current node details:
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: - node name: 'rabbitmq-cli-24963@juju-996685-auto-osci-sv07-39'
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: - home dir: .
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 rabbitmq[24954]: - cookie hash: 0dCmc/7tK+Fk85CRNH25Ew==
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 systemd[1]: rabbitmq-server.service: Control process exited, code=exited status=2
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 systemd[1]: rabbitmq-server.service: Unit entered failed state.
Feb 13 16:47:21 juju-996685-auto-osci-sv07-39 systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.

/<email address hidden>

=INFO REPORT==== 13-Feb-2020::16:47:08 ===
vm_memory_high_watermark set. Memory used:863652176 allowed:838852608

=WARNING REPORT==== 13-Feb-2020::16:47:08 ===
memory resource limit alarm set on node 'rabbit@juju-996685-auto-osci-sv07-39'.

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************

=ERROR REPORT==== 13-Feb-2020::16:47:08 ===
Discarding message {'$gen_call',{<0.20887.0>,#Ref<0.0.90701825.68905>},stat} from <0.20887.0> to <0.24027.2> in an old incarnation (1) of this node (3)

=ERROR REPORT==== 13-Feb-2020::16:47:17 ===
closing AMQP connection <0.13201.3> (172.17.107.50:58946 -> 172.17.107.52:5672):
Missed heartbeats from client, timeout: 60s

# ls -l /var/log/rabbitmq/startup_err
-rw-r--r-- 1 rabbitmq rabbitmq 74 Feb 13 16:47 /var/log/rabbitmq/startup_err

# cat /var/log/rabbitmq/startup_err
eheap_alloc: Cannot allocate 762886488 bytes of memory (of type "heap").

Liam Young (gnuoy)
Changed in charm-test-infra:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Liam Young (gnuoy)
Liam Young (gnuoy)
Changed in charm-test-infra:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.