rabbitmq-server charm does not setup HA mode with MAAS provide

Bug #1342539 reported by Yaguang Tang
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
rabbitmq-server (Juju Charms Collection)
Fix Released
Critical
Unassigned

Bug Description

1. Description of the problem:

rabbitmq-server charm shows inconsistent behavior between local provider and MAAS provider. In MAAS provider, rabbitmq cluster is not set up properly.

2. Ubuntu release, software version, Release Number and Architecture of the selected components.
Ubuntu 14.04 LTS
maas: 1.5.1+bzr2269-0ubuntu0.1
juju-core: 1.18.1-0ubuntu1
rabbitmq-server charm: http://bazaar.launchpad.net/~charmers/charms/trusty/rabbitmq-server/trunk/revision/54

3. How reproducible is the problem?
  ( easily with the test case, intermittent, on every boot, etc)
always

4. Steps to Reproduce:
 ( detailed enough for someone else to reproduce easily, scripts are welcome )

* create maas and local provider environment
* juju deploy -n2 cs:trusty/rabbitmq-server
* juju ssh rabbitmq-server/0 'sudo rabbitmqctl cluster_status'

 a. Actual Results:

1 node in cluster_status
====
ubuntu@node-fbde4b-2:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-fbde4b-2' ...
[{nodes,[{disc,['rabbit@node-fbde4b-2']}]},
 {running_nodes,['rabbit@node-fbde4b-2']},
 {partitions,[]}]
...done.
====

 b. Expected Results:

2 nodes in cluster_status.
====
Cluster status of node 'rabbit@ubuntu-local-machine-1' ...
[{nodes,[{disc,['rabbit@ubuntu-local-machine-1',
                'rabbit@ubuntu-local-machine-2']}]},
 {running_nodes,['rabbit@ubuntu-local-machine-2',
                 'rabbit@ubuntu-local-machine-1']},
 {partitions,[]}]
...done.
====

Related branches

Revision history for this message
Nicholas Pasqua (npasqua) wrote :

I am also wondering which behavior is correct. If HA was correct and running, should we see (a. Actual Results:) or (b. Expected Results:)?

What commands can we run to check that rabbitmq-server is correctly running in HA? On which machines should we run these commands?

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

@yaguang,

Could you share your all-machines.log file?

Revision history for this message
Yaguang Tang (heut2008) wrote :

Jorge, this bug is reported by Nobuto, not me, and finally we confirmed that this is caused by a MaaS bug https://bugs.launchpad.net/maas/+bug/1250435, after discuss with others, so we will wait for MaaS 1.6 be available in Trusty and possiblely Precise instead of fix it in rabbitmq-server charm. sorry not update this bug status.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Hey, in my testbed I always fail to setup HA with MAAS 1.5. But I always succeed with MAAS 1.6. Therefore I thought Bug #1250435 was the key.

However in one specific environment with MAAS 1.5, I can still see the success of setting up HA. I cannot reveal the log since it has sensible info, let me share it privately with Jorge.

Revision history for this message
Nobuto Murata (nobuto) wrote :

@npasqua,

> I am also wondering which behavior is correct. If HA was correct and running, should we see (a. Actual Results:) or (b. Expected Results:)?

If HA is correctly set up, you can see (b. Expected Results:).

> What commands can we run to check that rabbitmq-server is correctly running in HA? On which machines should we run these commands?

rabbitmqctl cluster_status works for you. If HA is correctly setup, you can see other nodes in cluster_status each other.
https://www.rabbitmq.com/clustering.html

Revision history for this message
Nobuto Murata (nobuto) wrote :

logs of failure case:
juju-core:
  Installed: 1.20.5-0ubuntu1~14.04.1~juju1
  Candidate: 1.20.5-0ubuntu1~14.04.1~juju1
  Version table:
 *** 1.20.5-0ubuntu1~14.04.1~juju1 0
        500 http://ppa.launchpad.net/juju/stable/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     1.18.1-0ubuntu1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
maas:
  Installed: 1.5.2+bzr2282-0ubuntu0.2
  Candidate: 1.5.2+bzr2282-0ubuntu0.2
  Version table:
 *** 1.5.2+bzr2282-0ubuntu0.2 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1.5+bzr2252-0ubuntu1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

Revision history for this message
Nobuto Murata (nobuto) wrote :

Oh wait, succeeded... hmm...

$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@maas-sailor-rabbitmq-server-a4dh' ...
[{nodes,[{disc,['rabbit@maas-sailor-rabbitmq-server-9d2i',
                'rabbit@maas-sailor-rabbitmq-server-a4dh']}]},
 {running_nodes,['rabbit@maas-sailor-rabbitmq-server-9d2i',
                 'rabbit@maas-sailor-rabbitmq-server-a4dh']},
 {partitions,[]}]
...done.

unit-rabbitmq-server-1: 2014-08-28 02:36:44 INFO juju-log cluster:0: Clustering with remote rabbit host (maas-sailor-rabbitmq-server-a4dh).
unit-rabbitmq-server-1: 2014-08-28 02:36:44 INFO cluster-relation-changed Stopping node 'rabbit@maas-sailor-rabbitmq-server-9d2i' ...
unit-rabbitmq-server-1: 2014-08-28 02:36:44 INFO cluster-relation-changed ...done.
unit-rabbitmq-server-1: 2014-08-28 02:36:44 INFO cluster-relation-changed Clustering node 'rabbit@maas-sailor-rabbitmq-server-9d2i' with 'rabbit@maas-sailor-rabbitmq-server-a4dh' ...
unit-rabbitmq-server-1: 2014-08-28 02:36:48 INFO cluster-relation-changed ...done.
unit-rabbitmq-server-1: 2014-08-28 02:36:48 INFO cluster-relation-changed Starting node 'rabbit@maas-sailor-rabbitmq-server-9d2i' ...
unit-rabbitmq-server-1: 2014-08-28 02:36:50 INFO cluster-relation-changed ...done.

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

@nobuto.

I think @yaguang fix should fix temporary the issue. This is equivalent to perform an
dig +short -t CNAME hostname.

However, You can perform a manual workaround, by doing:

Edit /etc/network/interfaces on all your rabbitmq-server units, and put your qualified domain into the 'eth0'
interface dns-search your.domain.name, then service networking restart.

Then manually retriggering the relation.

$ juju-run rabbitmq-server/0 ./hooks/cluster-relation-changed

Revision history for this message
Nobuto Murata (nobuto) wrote :

Let me summarize the situation.

I can see the clear difference between juju-core 1.18.1(does not work) and 1.20.5(does work), but I cannot see any big difference in /etc/hosts or /etc/resolv.conf. Let me try yaguang's branch with juju-core 1.18.1.

trusty default
==============
maas-sailor --debug \
    -- -c rabbitmq.yaml

maas:
  Installed: 1.5.2+bzr2282-0ubuntu0.2
juju-core:
  Installed: 1.18.1-0ubuntu1

cluster_status
--------------
$ juju ssh rabbitmq-server/0 sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@maas-sailor-rabbitmq-server-4aed' ...
[{nodes,[{disc,['rabbit@maas-sailor-rabbitmq-server-4aed']}]},
 {running_nodes,['rabbit@maas-sailor-rabbitmq-server-4aed']},
 {partitions,[]}]
...done.

-> just one node, not HA

/etc/hosts
----------
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
127.0.1.1 maas-sailor-rabbitmq-server-4aed.maas maas-sailor-rabbitmq-server-4aed

/etc/resolv.conf
----------------
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.81.0.2
search maas

trusty + juju PPA
=================
maas-sailor --debug \
    --add-repo ppa:juju/stable \
    -- -c rabbitmq.yaml

maas:
  Installed: 1.5.2+bzr2282-0ubuntu0.2
juju-core:
  Installed: 1.20.5-0ubuntu1~14.04.1~juju1

cluster_status
--------------
$ juju ssh rabbitmq-server/0 sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@maas-sailor-rabbitmq-server-3x61' ...
[{nodes,[{disc,['rabbit@maas-sailor-rabbitmq-server-3x61',
                'rabbit@maas-sailor-rabbitmq-server-y2t6']}]},
 {running_nodes,['rabbit@maas-sailor-rabbitmq-server-y2t6',
                 'rabbit@maas-sailor-rabbitmq-server-3x61']},
 {partitions,[]}]
...done.

-> has 2 nodes, HA

/etc/hosts
----------
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
127.0.1.1 maas-sailor-rabbitmq-server-3x61.maas maas-sailor-rabbitmq-server-3x61

/etc/resolv.conf
----------------
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.81.0.2
search maas

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

I tried with juju-core 1.18.1 and lp:~heut2008/charms/precise/rabbitmq-server/hostname-issue, no luck.

I'm a bit lost, let me know if you need additional info from me.

====
unit-rabbitmq-server-1: 2014-08-28 10:44:25 INFO juju-log cluster:0: Clustering with remote rabbit host (10-81-0-103).
unit-rabbitmq-server-1: 2014-08-28 10:44:25 INFO cluster-relation-changed Stopping node 'rabbit@maas-sailor-rabbitmq-server-9czy' ...
unit-rabbitmq-server-1: 2014-08-28 10:44:25 INFO cluster-relation-changed ...done.
unit-rabbitmq-server-1: 2014-08-28 10:44:26 INFO cluster-relation-changed Clustering node 'rabbit@maas-sailor-rabbitmq-server-9czy' with 'rabbit@10
-81-0-103' ...
unit-rabbitmq-server-1: 2014-08-28 10:44:26 INFO cluster-relation-changed Error: {cannot_discover_cluster,"The nodes provided are either offline or
 not running"}
unit-rabbitmq-server-1: 2014-08-28 10:44:26 INFO worker.uniter.jujuc server.go:103 running hook tool "juju-log" ["Failed to cluster with 10-81-0-10
3."]
unit-rabbitmq-server-1: 2014-08-28 10:44:26 DEBUG worker.uniter.jujuc server.go:104 hook context id "rabbitmq-server/1:cluster-relation-changed:190320048867853421"; dir "/var/lib/juju/agents/unit-rabbitmq-server-1/charm"
unit-rabbitmq-server-1: 2014-08-28 10:44:26 INFO juju-log cluster:0: Failed to cluster with 10-81-0-103.

Revision history for this message
Nobuto Murata (nobuto) wrote :

I think I found something. There is a difference of return value of private-address between juju-core 1.18.1 vs 1.20.5.

1.20.5 returns a full hostname, I think it's ok for rabbitmq charm.
1.18.1 returns an IP address, I don't think charm can convert it to hostname without the MAAS fix for Bug: #1250435.

with juju-core 1.20.5-0ubuntu1~14.04.1~juju1
====================================
juju deploy --repository ~/charms local:trusty/rabbitmq-server

juju debug-hooks rabbitmq-server/0

juju add-unit rabbitmq-server

root@maas-sailor-rabbitmq-server-h3k3:/var/lib/juju/agents/unit-rabbitmq-server-0/charm# env | grep JUJU
JUJU_API_ADDRESSES=maas-sailor-bootstrap-1.maas:17070
JUJU_UNIT_NAME=rabbitmq-server/0
JUJU_REMOTE_UNIT=rabbitmq-server/1
JUJU_RELATION=cluster
JUJU_ENV_NAME=maas
JUJU_ENV_UUID=3b4b2336-9a61-46bf-8d4f-6c607d0cc70c
JUJU_CONTEXT_ID=rabbitmq-server/0:cluster-relation-joined:7671545854207501306
JUJU_HOOK_NAME=cluster-relation-joined
JUJU_RELATION_ID=cluster:0
JUJU_AGENT_SOCKET=@/var/lib/juju/agents/unit-rabbitmq-server-0/agent.socket
JUJU_DEBUG=/tmp/tmp.BM9O5yOlEX

root@maas-sailor-rabbitmq-server-h3k3:/var/lib/juju/agents/unit-rabbitmq-server-0/charm# relation-get
private-address: maas-sailor-rabbitmq-server-irg4.maas

  -> full hostname

with juju-core 1.18.1-0ubuntu1
==================================

juju deploy --repository ~/charms local:trusty/rabbitmq-server

juju debug-hooks rabbitmq-server/0

juju add-unit rabbitmq-server

rabbitmq-server/0:cluster-relation-joined % env | grep JUJU
JUJU_API_ADDRESSES=10.81.0.102:17070
JUJU_UNIT_NAME=rabbitmq-server/0
JUJU_REMOTE_UNIT=rabbitmq-server/1
JUJU_RELATION=cluster
JUJU_ENV_NAME=maas
JUJU_ENV_UUID=96e65a99-561f-4f95-8727-b14949f91f43
JUJU_CONTEXT_ID=rabbitmq-server/0:cluster-relation-joined:3635058044155482399
JUJU_HOOK_NAME=cluster-relation-joined
JUJU_RELATION_ID=cluster:0
JUJU_AGENT_SOCKET=@/var/lib/juju/agents/unit-rabbitmq-server-0/agent.socket
JUJU_DEBUG=/tmp/tmp.YTtRL3SDg3

rabbitmq-server/0:cluster-relation-joined % relation-get
private-address: 10.81.0.104

-> ip address not hostname

JuanJo Ciarlante (jjo)
tags: added: canonical-is
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

I'm having the same symptoms with rabbit over MaaS, but afaics it's a different issue:
* peers can resolve themselves as A-B-C-D.maas <-> A.B.C.D _but_ rabbitmq nodename is set to e.g. juju-machine-X-lxc-Y:
- rabbitmq joining the cluster connects to A-B-C-D:5672 ok, then doesn't find there a rabbitmq@A-B-C-D node but rather juju-machine-X-lxc-Y
- brute force /etc/hosts proof: http://paste.ubuntu.com/8428426/
( note: these are LXCs over MaaS'd hosts)

* crafted a branch to force nodename to be DNS resolvable by their peers at lp:~jjo/charms/trusty/rabbitmq-server/fix-nodename-to-host-dns-PTR, which made it work for my setup: http://paste.ubuntu.com/8428450/, you may want to give it a try and feedback here.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Current revision of lp:charms/trusty/rabbitmq-server seems to cause a regression with juju-core 1.20.x.
maas:
  Installed: 1.5.4+bzr2294-0ubuntu1.1
juju-core:
  Installed: 1.20.9-0ubuntu1~14.04.1~juju1

014-10-02 16:53:05 INFO worker.uniter.jujuc server.go:102 running hook tool "juju-log" ["Clustering with remote rabbit host (maas-sailor-rabbitmq-
server-jx0a)."]
2014-10-02 16:53:05 DEBUG worker.uniter.jujuc server.go:103 hook context id "rabbitmq-server/1:cluster-relation-changed:8365762691290411126"; dir "
/var/lib/juju/agents/unit-rabbitmq-server-1/charm"
2014-10-02 16:53:05 INFO juju-log cluster:15: Clustering with remote rabbit host (maas-sailor-rabbitmq-server-jx0a).
2014-10-02 16:53:05 INFO cluster-relation-changed Stopping node 'rabbit@10-81-0-115' ...
2014-10-02 16:53:05 INFO cluster-relation-changed ...done.
2014-10-02 16:53:06 INFO cluster-relation-changed Clustering node 'rabbit@10-81-0-115' with 'rabbit@maas-sailor-rabbitmq-server-jx0a' ...
2014-10-02 16:53:06 INFO cluster-relation-changed Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
2014-10-02 16:53:06 INFO worker.uniter.jujuc server.go:102 running hook tool "juju-log" ["Failed to cluster with maas-sailor-rabbitmq-server-jx0a."]

Revision history for this message
Nobuto Murata (nobuto) wrote :

Regarding a regression, I have opened it as bug #1378263 explicitly.

Changed in rabbitmq-server (Juju Charms Collection):
status: New → Fix Committed
importance: Undecided → Critical
tags: added: cts
Changed in rabbitmq-server (Juju Charms Collection):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.