Cannot add more than one unit on openstack

Bug #1450699 reported by Michael Nelson
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Expired
Medium
Unassigned
rabbitmq-server (Juju Charms Collection)
Invalid
Medium
Unassigned

Bug Description

This is possibly specific to the Canonical PS4 openstack environment (or specifically, the dns resolution there), but it may be better if rabbitmq defaulted to IP addresses for RABBITMQ_NODENAME.

Steps to reproduce:
 1) Deploy rabbitmq-server into a fresh openstack environment
 2) Add a second unit
 3) Wait and check the status.

Expected result: new node has joined the cluster.
Actual result: cluster-relation-changed error because an invalid hostname of a single number has been set (eg. 'ERROR: epmd error for host "6": badarg (unknown POSIX error')

I initially saw it with r82 of the charm - so reproduced it again just now there [1] (example plus full details of error), but also checked and verified the same issue with r99 [2]

pjdc says that the bad hostname is likely from the reverse dns,

[1] http://paste.ubuntu.com/10958485/
[2] http://paste.ubuntu.com/10958620/

description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-server using the local_provider and the steps you specify and I am not seeing this problem. I have also regularly deployed to openstack in the last few days and am not seeing an issue there either. Could i ask you to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-server-0.log from all your units and also give me the service name of your deployed rabbitmq service and the hostnames of the nodes deployed to. Thanks.

Changed in rabbitmq-server (Juju Charms Collection):
status: New → Incomplete
Revision history for this message
Michael Nelson (michael.nelson) wrote : Re: [Bug 1450699] Re: Cannot add more than one unit on openstack

On Fri, May 1, 2015 at 5:20 PM, Edward Hope-Morley
<email address hidden> wrote:
> Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-
> server using the local_provider and the steps you specify and I am not
> seeing this problem.

I'm not able to reproduce it with the local provider either (where I
test our deployments), that's why this only hit us when we actually
deployed to PS4. So far, I've only reproduced the issue there on PS4 -
as per the above pastes.

> I have also regularly deployed to openstack in the
> last few days and am not seeing an issue there either. Could i ask you
> to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-
> server-0.log from all your units and also give me the service name of
> your deployed rabbitmq service and the hostnames of the nodes deployed
> to. Thanks.

Yep, I'll re-deploy the test environment from our staging devops
deployment box on Monday and get those logs for you (or if you have
access to a PS4 environment, you can reproduce it there). As I
mentioned, it may be specific to the (lack of?) reverse dns on PS4 -
not sure.

Thanks.
-Michael

Revision history for this message
Michael Nelson (michael.nelson) wrote :
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (3.4 KiB)

I've attached the requested logs.

As pjdc pointed out, the issue is that the hostnames are:

[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
juju status | grep dns
    dns-name: 10.35.128.11
    dns-name: 10.35.128.12
    dns-name: 10.35.128.13
[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
host 10.35.128.12
12.128.35.10.in-addr.arpa domain name pointer
12.128.35.10.instance.prodstack4.internal.
[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
host 10.35.128.13
13.128.35.10.in-addr.arpa domain name pointer
13.128.35.10.instance.prodstack4.internal.

and the rabbitmq charm (well,
charmhelpers.contrib.openstack.utils.get_hostname) assumes that you
can just do:

return result.split('.')[0]

with that - hence the rabbitmq hostname being set to 12 and 13 respectively:

$ juju run --service rabbitmq-server "cat /etc/rabbitmq/rabbitmq-env.conf"
- MachineId: "1"
  Stdout: 'RABBITMQ_NODENAME=rabbit@12'
  UnitId: rabbitmq-server/0
- MachineId: "2"
  Stdout: 'RABBITMQ_NODENAME=rabbit@13'
  UnitId: rabbitmq-server/1

I don't know the requirements on rabbitmq's nodename - I'm assuming
fqdn=True (the default) doesn't work?

nodename = get_hostname(ip_addr, fqdn=True)

otherwise, that would use less assumptions about the dns entry.

On Fri, May 1, 2015 at 5:20 PM, Edward Hope-Morley
<email address hidden> wrote:
> Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-
> server using the local_provider and the steps you specify and I am not
> seeing this problem. I have also regularly deployed to openstack in the
> last few days and am not seeing an issue there either. Could i ask you
> to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-
> server-0.log from all your units and also give me the service name of
> your deployed rabbitmq service and the hostnames of the nodes deployed
> to. Thanks.
>
> ** Changed in: rabbitmq-server (Juju Charms Collection)
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1450699
>
> Title:
> Cannot add more than one unit on openstack
>
> Status in rabbitmq-server package in Juju Charms Collection:
> Incomplete
>
> Bug description:
> This is possibly specific to the Canonical PS4 openstack environment
> (or specifically, the dns resolution there), but it may be better if
> rabbitmq defaulted to IP addresses for RABBITMQ_NODENAME.
>
> Steps to reproduce:
> 1) Deploy rabbitmq-server into a fresh openstack environment
> 2) Add a second unit
> 3) Wait and check the status.
>
> Expected result: new node has joined the cluster.
> Actual result: cluster-relation-changed error because an invalid hostname of a single number has been set (eg. 'ERROR: epmd error for host "6": badarg (unknown POSIX error')
>
> I initially saw it with r82 of the charm - so reproduced it again just
> now there [1] (example plus full details of error), but also checked
> and verified the same issue with r99 [2]
>
> pjdc says that the bad hostname is likely from...

Read more...

Revision history for this message
James Westby (james-w) wrote :

It seems to me that using fqdn=True would be right here if rabbit expects
to be able to resolve it as an address. I don't think it's universally true that
if you RDNS an IP to "foo.something" that "foo" will resolve back to the
same machine, as evidenced here.

This comment on an old bug report about this issue suggests that there may
be some concerns about changing this value for existing instances:

https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/653405/comments/22

Haw Loeung (hloeung)
Changed in rabbitmq-server (Juju Charms Collection):
status: Incomplete → Confirmed
Revision history for this message
Liam Young (gnuoy) wrote :

Is this still an issue for you? dames has done work to untangle some of the clustering mess in the charm and as part of that the setting of RABBITMQ_NODENAME was removed as this is only pertinent for deploying multiple rabbits on the same server.

I've deployed the latest rabbit charm, using the Openstack provider, then scaled it out and it seems to be fine.

⟫ juju status
Model Controller Cloud/Region Version
rabbit dev serverstack/serverstack 2.0.2

App Version Status Scale Charm Store Rev OS Notes
rabbitmq-server 3.5.7 active 2 rabbitmq-server jujucharms 57 ubuntu

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* active idle 0 10.5.35.94 5672/tcp Unit is ready and clustered
rabbitmq-server/1 active idle 1 10.5.35.95 5672/tcp Unit is ready and clustered

Machine State DNS Inst id Series AZ
0 started 10.5.35.94 a6ee435b-2225-40ef-ab5d-844c4f5946fa xenial nova
1 started 10.5.35.95 438c9848-e77e-4788-ba1d-f43f42e869cf xenial nova

Relation Provides Consumes Type
cluster rabbitmq-server rabbitmq-server peer

⟫ juju run --application rabbitmq-server "sudo rabbitmqctl cluster_status"
- Stdout: |
    Cluster status of node 'rabbit@juju-afa48f-rabbit-0' ...
    [{nodes,[{disc,['rabbit@juju-afa48f-rabbit-0',
                    'rabbit@juju-afa48f-rabbit-1']}]},
     {running_nodes,['rabbit@juju-afa48f-rabbit-1','rabbit@juju-afa48f-rabbit-0']},
     {cluster_name,<<"rabbit@juju-afa48f-rabbit-0">>},
     {partitions,[]}]
  UnitId: rabbitmq-server/0
- Stdout: |
    Cluster status of node 'rabbit@juju-afa48f-rabbit-1' ...
    [{nodes,[{disc,['rabbit@juju-afa48f-rabbit-0',
                    'rabbit@juju-afa48f-rabbit-1']}]},
     {running_nodes,['rabbit@juju-afa48f-rabbit-0','rabbit@juju-afa48f-rabbit-1']},
     {cluster_name,<<"rabbit@juju-afa48f-rabbit-0">>},
     {partitions,[]}]
  UnitId: rabbitmq-server/1

Changed in rabbitmq-server (Juju Charms Collection):
importance: Undecided → Medium
status: Confirmed → Incomplete
James Page (james-page)
Changed in charm-rabbitmq-server:
importance: Undecided → Medium
status: New → Incomplete
Changed in rabbitmq-server (Juju Charms Collection):
status: Incomplete → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack rabbitmq-server charm because there has been no activity for 60 days.]

Changed in charm-rabbitmq-server:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.