Upgrade 16.07 -> 16.10 breaks on node name
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Triaged
|
Medium
|
Unassigned |
Bug Description
When upgrading rabbitmq-server from 16.07 to 16.10 I'm getting an error in wait_app()
Due to the changes introduced in Change-Id: I105eb2684e61a5
However, units reverse dns reso resolves to another name, and in the cluster they're known by that name.
Hostnames, DNS reso:
$ juju run --unit rabbitmq-server/3 'hostname ; unit-get private-address ; dig +short -x $( unit-get private-address )'
...
juju-machine-
10.76.12.252
10-76-12-252.maas.
Rabbit nodes are known by the second, maas generated name:
$ u=rabbitmq-
rabbitmq-
rabbitmq-
When running upgrade-charm the wait_app func expects the pid file in the wrong place b/c of this:
Reading package lists...
Waiting for 'rabbit@
pid is 13134 ...
Error: process_not_running
Traceback (most recent call last):
File "/var/lib/
rabbit.
File "/var/lib/
assess_
File "/var/lib/
services=
File "/var/lib/
state, message, lambda: charm_func(
File "/var/lib/
charm_state, charm_message = charm_func_
File "/var/lib/
state, message, lambda: charm_func(
File "/var/lib/
ret = wait_app()
File "/var/lib/
raise ex
subprocess.
2017-08-11 10:54:39 ERROR juju.worker.
Other functions that depend on the clustername to equal socket.
Juju: 1.25.10
description: | updated |
Changed in charm-rabbitmq-server: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: | added: charm-upgrade |
I found that the charm upgrade would complete (and other hooks that call the status check) after creating a symlink /<email address hidden> to <email address hidden> files on all rabbit units. Obviously not a scalable solution.
How best can we handle this corner case when there are multiple reverse DNS entries in a repeatable manner pre and post 16.10? I also checked the 17.02 code and skipping a rev won't help this issue. It seems odd to lookup the hostname for a pid filename instead of checking config files or rabbitmqctl command outputs. for instance, the rabbitmqctl wait <pidfile> command shows "Waiting for 'rabbit@ ip-ad-dr- es'" in the log file (and when run manually) as you can see in Peter's log.
# rabbitmqctl wait /var/lib/ rabbitmq/ mnesia/ rabbit\ @10-76- 13-12.pid 10-76-13- 12' ...
Waiting for 'rabbit@
pid is 16537 ...
(exit code 0)
From what I can tell following the code:
- in 16.07 wait_app uses get_local_ nodename( ) to determine PID filename ip(unit_ get('private- address' )) which in turn calls node_hostname that either uses get_hostname( ip_addr) (coming from .contrib. openstack. utils) or falls back to socket. gethostname( ) contrib. openstack. utils.get_ hostname calls .contrib. network. ip.get_ hostname which in turn either .from_address( address) or fails back to gethostbyaddr( address) [0]
which in turn calls get_host_
get_
charmhelpers
- charmhelpers.
charmhelpers
runs dns.reversename
socket.
- Noting from lp:1710247 ref to lp:1484902 that this is intentional
for maas2 support.
Perhaps in upgrade-charm, if pid file from hostname code fails, return code should be checked and command output should be used to find the previous pid file name to use and then add a name change routine to re-configure the server and cluster relationships.