2017-08-14 07:58:41 |
Peter Sabaini |
description |
When upgrading rabbitmq-server from 16.07 to 16.10 I'm getting an error wait_app()
Due to the changes introduced in Change-Id: I105eb2684e61a553a52c5a944e8c562945e2a6eb (cf. Bug #1584902) the nodename of a rabbitmq node is expected to equal socket.gethostname().
However, units reverse reso resolves to another name, and the in cluster they're known by that name.
$ juju run --unit rabbitmq-server/3 'hostname ; unit-get private-address ; dig +short -x $( unit-get private-address )'
...
juju-machine-1-lxc-14
10.76.12.252
10-76-12-252.maas.
$ u=rabbitmq-server/3;r=cluster; juju run --unit $u "relation-ids $r| xargs -I_@ sh -c 'relation-list -r _@|xargs -I_U sh -c \"relation-get -r _@ - _U |sed s,^,_U:, 2>&1\"'" | grep clustered
rabbitmq-server/4:clustered: 10-76-12-236
rabbitmq-server/5:clustered: 10-76-12-245
When running upgrade-charm the wait_app func expects the pid file in the wrong place b/c of this:
Reading package lists...
Waiting for 'rabbit@10-76-12-252' ...
pid is 13134 ...
Error: process_not_running
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/upgrade-charm", line 709, in <module>
rabbit.assess_status(rabbit.ConfigRenderer(rabbit.CONFIG_FILES))
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 809, in assess_status
assess_status_func(configs)()
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 833, in _assess_status_func
services=services(), ports=None)
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1178, in _determine_os_workload_status
state, message, lambda: charm_func(configs))
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1306, in _ows_check_charm_func
charm_state, charm_message = charm_func_with_configs()
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1178, in <lambda>
state, message, lambda: charm_func(configs))
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 744, in assess_cluster_status
ret = wait_app()
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 361, in wait_app
raise ex
subprocess.CalledProcessError: Command '['timeout', '180', '/usr/sbin/rabbitmqctl', 'wait', '/var/lib/rabbitmq/mnesia/rabbit@juju-machine-1-lxc-14.pid']' returned non-zero exit status 2
2017-08-11 10:54:39 ERROR juju.worker.uniter.operation runhook.go:107 hook "upgrade-charm" failed: exit status 1
Other functions that depend on the clustername to equal socket.gethostname() will likely fail too, eg. is_leader()
Juju: 1.25.10 |
When upgrading rabbitmq-server from 16.07 to 16.10 I'm getting an error in wait_app()
Due to the changes introduced in Change-Id: I105eb2684e61a553a52c5a944e8c562945e2a6eb (cf. Bug #1584902) the nodename of a rabbitmq node is expected to equal socket.gethostname().
However, units reverse dns reso resolves to another name, and in the cluster they're known by that name.
Hostnames, DNS reso:
$ juju run --unit rabbitmq-server/3 'hostname ; unit-get private-address ; dig +short -x $( unit-get private-address )'
...
juju-machine-1-lxc-14
10.76.12.252
10-76-12-252.maas.
Rabbit nodes are known by the second, maas generated name:
$ u=rabbitmq-server/3;r=cluster; juju run --unit $u "relation-ids $r| xargs -I_@ sh -c 'relation-list -r _@|xargs -I_U sh -c \"relation-get -r _@ - _U |sed s,^,_U:, 2>&1\"'" | grep clustered
rabbitmq-server/4:clustered: 10-76-12-236
rabbitmq-server/5:clustered: 10-76-12-245
When running upgrade-charm the wait_app func expects the pid file in the wrong place b/c of this:
Reading package lists...
Waiting for 'rabbit@10-76-12-252' ...
pid is 13134 ...
Error: process_not_running
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/upgrade-charm", line 709, in <module>
rabbit.assess_status(rabbit.ConfigRenderer(rabbit.CONFIG_FILES))
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 809, in assess_status
assess_status_func(configs)()
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 833, in _assess_status_func
services=services(), ports=None)
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1178, in _determine_os_workload_status
state, message, lambda: charm_func(configs))
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1306, in _ows_check_charm_func
charm_state, charm_message = charm_func_with_configs()
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1178, in <lambda>
state, message, lambda: charm_func(configs))
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 744, in assess_cluster_status
ret = wait_app()
File "/var/lib/juju/agents/unit-rabbitmq-server-3/charm/hooks/rabbit_utils.py", line 361, in wait_app
raise ex
subprocess.CalledProcessError: Command '['timeout', '180', '/usr/sbin/rabbitmqctl', 'wait', '/var/lib/rabbitmq/mnesia/rabbit@juju-machine-1-lxc-14.pid']' returned non-zero exit status 2
2017-08-11 10:54:39 ERROR juju.worker.uniter.operation runhook.go:107 hook "upgrade-charm" failed: exit status 1
Other functions that depend on the clustername to equal socket.gethostname() will likely fail too, eg. is_leader()
Juju: 1.25.10 |
|