regression: rabbitmq-server HA setup no longer works with juju-core 1.20

Bug #1378263 reported by Nobuto Murata
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
rabbitmq-server (Juju Charms Collection)
Fix Released
High
Jorge Niedbalski

Bug Description

maas:
  Installed: 1.5.4+bzr2294-0ubuntu1.1
juju-core:
  Installed: 1.20.9-0ubuntu1~14.04.1~juju1

rabbitmq-server charm(trusty) revno: 63
The last known working revision is revno 61.

MAAS 1.5 + juju-core 1.20 + revno 63 = fail
MAAS 1.5 + juju-core 1.20 + revno 62 = fail
MAAS 1.5 + juju-core 1.20 + revno 61 = ok

/var/log/juju/all-machines.log:
unit-rabbitmq-server-1: 2014-10-07 09:41:51 INFO juju-log cluster:0: Clustering with remote rabbit host (maas-sailor-rabbitmq-server-yu1g).
unit-rabbitmq-server-1: 2014-10-07 09:41:51 INFO cluster-relation-changed Stopping node 'rabbit@10-81-0-104' ...
unit-rabbitmq-server-1: 2014-10-07 09:41:52 INFO cluster-relation-changed ...done.
unit-rabbitmq-server-1: 2014-10-07 09:41:52 INFO cluster-relation-changed Clustering node 'rabbit@10-81-0-104' with 'rabbit@maas-sailor-rabbitmq-server-yu1g' ...
unit-rabbitmq-server-1: 2014-10-07 09:41:52 INFO cluster-relation-changed Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
unit-rabbitmq-server-1: 2014-10-07 09:41:52 INFO worker.uniter.jujuc server.go:102 running hook tool "juju-log" ["Failed to cluster with maas-sailor-rabbitmq-server-yu1g."]

Tags: cts

Related branches

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

Tested with this juju bundle.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Is this possibly related to https://bugs.launchpad.net/charms/+source/rabbitmq-server/+bug/1342539 which was fixed in r62?

Revision history for this message
Janghoon-Paul Sim (janghoon) wrote :

@Edward,

this issue is not fixed yet.
With r63 on MAAS 1.6, juju 1.18, the following error still occurs

unit-rabbitmq-server-1: 2014-10-07 07:20:24 INFO juju-log cluster:20: /etc/rabbitmq/rabbitmq-env.conf does not exist, creating.
unit-rabbitmq-server-1: 2014-10-07 07:20:24 INFO cluster-relation-joined * Restarting message broker rabbitmq-server
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined Traceback (most recent call last):
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/cluster-relation-joined", line 600, in <module>
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined hooks.execute(sys.argv)
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/charmhelpers/core/hookenv.py", line 502, in execute
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined self._hooks[hook_name]()
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/cluster-relation-joined", line 167, in cluster_joined
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined fqdn=False)
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 446, in get_hostname
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined result = ns_query(rev)
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 416, in ns_query
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined answers = dns.resolver.query(address, rtype)
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 974, in query
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined raise_on_no_answer, source_port)
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 903, in query
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined raise NXDOMAIN
unit-rabbitmq-server-0: 2014-10-07 07:20:21 INFO cluster-relation-joined dns.resolver.NXDOMAIN
unit-rabbitmq-server-0: 2014-10-07 07:20:21 ERROR juju.worker.uniter uniter.go:482 hook failed: exit status 1

When trying r61, it works fine.

Nobuto Murata (nobuto)
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

MAAS 1.5 + juju-core 1.20 + revno 63 = fail
MAAS 1.5 + juju-core 1.20 + revno 62 = fail
MAAS 1.5 + juju-core 1.20 + revno 61 = ok

so I believe revno 62 causes this regression.

Revision history for this message
Janghoon-Paul Sim (janghoon) wrote :

Please discard my comment.
I think the error on my comment at #4 is different one from Nobuto's.

I was able to reproduce the issue on my test environment.
This error happens when DNS revers lookup gets fails for some reasons.
In my case, rabbitmq-server charms can't get hostname by 192.168.128.113 then cluster-relation-joined gets failed.

Line 167 in cluster-relation-joined
nodename = get_hostname(get_host_ip(unit_get('private-address')),
fqdn=False)
This error occurs in get_hostname() when revers lookup fails.

Nobuto Murata (nobuto)
tags: added: cts
Revision history for this message
Nobuto Murata (nobuto) wrote :

fyi, I'm trying to verify the latest Rev.71 works with all possible combination with MAAS+Juju using the bundle below.

====
rabbitmq:
  services:
    rabbitmq-server:
      charm: cs:trusty/rabbitmq-server
      num_units: 2
  series: trusty

Revision history for this message
Nobuto Murata (nobuto) wrote :

maas 1.5.4+bzr2294-0ubuntu1.1 (trusty-updates)
  + juju-core 1.18.4+dfsg-0ubuntu0.14.04.1 (trusty-updates) -> OK
  + juju-core 1.20.11-0ubuntu1~14.04.1~juju1 (ppa:juju/stable) -> Fail

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

A few clarifications on this case:

1) Maas 1.5 treated containers as second class citizens not assigning them
a full DNS entity ( excluded PTR records) , for that reason the charm method get_hostname fails because is not
being able to determine the entry PTR on the maas DNS server.

This changed on the latest Maas development releases, since containers are being
considered as first class citizens just like machines with full DNS resolution.

2) We added a second fallback on the charm for use socket.gethostname() if no PTR resolution is possible,
on this case, the nodename will be set to the current machine hostname.

Please note that On _any_ case a name resolution should be available for the other cluster nodes , via /etc/hosts
or by setting a DNS server or DNSmasq on the local network , since rabbitmq doesn't support cluster aggregation using IP addresses.

Hope this thread clarifies a bit more this issue.

Revision history for this message
Nobuto Murata (nobuto) wrote :

@Jorge,
I'm confused now.

> 1) Maas 1.5 treated containers as second class citizens

If you mean containers as LXC containers on top of MAAS nodes, this is not the case. I'm using a normal MAAS baremetal nodes for rabbitmq-server.

I'm only talking about the regression which happened in a particular revision of the charm.
> MAAS 1.5 + juju-core 1.20 + revno 62 = fail
> MAAS 1.5 + juju-core 1.20 + revno 61 = ok

Changed in rabbitmq-server (Juju Charms Collection):
assignee: nobody → Jorge Niedbalski (niedbalski)
importance: Undecided → High
Revision history for this message
Nobuto Murata (nobuto) wrote :

Now that juju-core 1.20.x is available for trusty (LP: #1386144), this issue might affect more users. Personally I'm not affected though.

Changed in rabbitmq-server (Juju Charms Collection):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.