Bug #1450699 “Cannot add more than one unit on openstack” : Bugs : OpenStack RabbitMQ Server Charm

Michael Nelson (michael.nelson) on 2015-05-01

description:

updated

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2015-05-01:

#1

Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-server using the local_provider and the steps you specify and I am not seeing this problem. I have also regularly deployed to openstack in the last few days and am not seeing an issue there either. Could i ask you to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-server-0.log from all your units and also give me the service name of your deployed rabbitmq service and the hostnames of the nodes deployed to. Thanks.

Changed in rabbitmq-server (Juju Charms Collection):
status:	New → Incomplete

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-01: Re: [Bug 1450699] Re: Cannot add more than one unit on openstack

#2

On Fri, May 1, 2015 at 5:20 PM, Edward Hope-Morley
<email address hidden> wrote:
> Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-
> server using the local_provider and the steps you specify and I am not
> seeing this problem.

I'm not able to reproduce it with the local provider either (where I
test our deployments), that's why this only hit us when we actually
deployed to PS4. So far, I've only reproduced the issue there on PS4 -
as per the above pastes.

> I have also regularly deployed to openstack in the
> last few days and am not seeing an issue there either. Could i ask you
> to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-
> server-0.log from all your units and also give me the service name of
> your deployed rabbitmq service and the hostnames of the nodes deployed
> to. Thanks.

Yep, I'll re-deploy the test environment from our staging devops
deployment box on Monday and get those logs for you (or if you have
access to a PS4 environment, you can reproduce it there). As I
mentioned, it may be specific to the (lack of?) reverse dns on PS4 -
not sure.

Thanks.
-Michael

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-04:

#3

unit-rabbitmq-server-0.log Edit (75.2 KiB, text/plain)

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-04:

#4

unit-rabbitmq-server-1.log Edit (79.9 KiB, text/plain)

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-04:

#5

Download full text (3.4 KiB)

I've attached the requested logs.

As pjdc pointed out, the issue is that the hostnames are:

[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
juju status | grep dns
    dns-name: 10.35.128.11
    dns-name: 10.35.128.12
    dns-name: 10.35.128.13
[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
host 10.35.128.12
12.128.35.10.in-addr.arpa domain name pointer
12.128.35.10.instance.prodstack4.internal.
[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
host 10.35.128.13
13.128.35.10.in-addr.arpa domain name pointer
13.128.35.10.instance.prodstack4.internal.

and the rabbitmq charm (well,
charmhelpers.contrib.openstack.utils.get_hostname) assumes that you
can just do:

return result.split('.')[0]

with that - hence the rabbitmq hostname being set to 12 and 13 respectively:

$ juju run --service rabbitmq-server "cat /etc/rabbitmq/rabbitmq-env.conf"
- MachineId: "1"
  Stdout: 'RABBITMQ_NODENAME=rabbit@12'
  UnitId: rabbitmq-server/0
- MachineId: "2"
  Stdout: 'RABBITMQ_NODENAME=rabbit@13'
  UnitId: rabbitmq-server/1

I don't know the requirements on rabbitmq's nodename - I'm assuming
fqdn=True (the default) doesn't work?

nodename = get_hostname(ip_addr, fqdn=True)

otherwise, that would use less assumptions about the dns entry.

On Fri, May 1, 2015 at 5:20 PM, Edward Hope-Morley
<email address hidden> wrote:
> Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-
> server using the local_provider and the steps you specify and I am not
> seeing this problem. I have also regularly deployed to openstack in the
> last few days and am not seeing an issue there either. Could i ask you
> to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-
> server-0.log from all your units and also give me the service name of
> your deployed rabbitmq service and the hostnames of the nodes deployed
> to. Thanks.
>
> ** Changed in: rabbitmq-server (Juju Charms Collection)
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1450699
>
> Title:
> Cannot add more than one unit on openstack
>
> Status in rabbitmq-server package in Juju Charms Collection:
> Incomplete
>
> Bug description:
> This is possibly specific to the Canonical PS4 openstack environment
> (or specifically, the dns resolution there), but it may be better if
> rabbitmq defaulted to IP addresses for RABBITMQ_NODENAME.
>
> Steps to reproduce:
> 1) Deploy rabbitmq-server into a fresh openstack environment
> 2) Add a second unit
> 3) Wait and check the status.
>
> Expected result: new node has joined the cluster.
> Actual result: cluster-relation-changed error because an invalid hostname of a single number has been set (eg. 'ERROR: epmd error for host "6": badarg (unknown POSIX error')
>
> I initially saw it with r82 of the charm - so reproduced it again just
> now there [1] (example plus full details of error), but also checked
> and verified the same issue with r99 [2]
>
> pjdc says that the bad hostname is likely from...

I've attached the requested logs.

As pjdc pointed out, the issue is that the hostnames are:

[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
juju status | grep dns
    dns-name: 10.35.128.11
    dns-name: 10.35.128.12
    dns-name: 10.35.128.13
[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
host 10.35.128.12
12.128.35.10.in-addr.arpa domain name pointer
12.128.35.10.instance.prodstack4.internal.
[prod-u1-sca] prod-u1-sca@wendigo:/srv/mojo/mojo-prod-u1-sca/trusty/production/charms$
host 10.35.128.13
13.128.35.10.in-addr.arpa domain name pointer
13.128.35.10.instance.prodstack4.internal.

and the rabbitmq charm (well,
charmhelpers.contrib.openstack.utils.get_hostname) assumes that you
can just do:

return result.split('.')[0]

with that - hence the rabbitmq hostname being set to 12 and 13 respectively:

$ juju run --service rabbitmq-server "cat /etc/rabbitmq/rabbitmq-env.conf"
- MachineId: "1"
  Stdout: 'RABBITMQ_NODENAME=rabbit@12'
  UnitId: rabbitmq-server/0
- MachineId: "2"
  Stdout: 'RABBITMQ_NODENAME=rabbit@13'
  UnitId: rabbitmq-server/1

I don't know the requirements on rabbitmq's nodename - I'm assuming
fqdn=True (the default) doesn't work?

nodename = get_hostname(ip_addr, fqdn=True)

otherwise, that would use less assumptions about the dns entry.

On Fri, May 1, 2015 at 5:20 PM, Edward Hope-Morley
<edward.hope-morley@canonical.com> wrote:
> Michael, i've had a quick go at deploying lp:charms/trusty/rabbitmq-
> server using the local_provider and the steps you specify and I am not
> seeing this problem. I have also regularly deployed to openstack in the
> last few days and am not seeing an issue there either. Could i ask you
> to provide your rabbitmq juju logs e.g. /var/log/juju/unit-rabbitmq-
> server-0.log from all your units and also give me the service name of
> your deployed rabbitmq service and the hostnames of the nodes deployed
> to. Thanks.
>
> ** Changed in: rabbitmq-server (Juju Charms Collection)
>        Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1450699
>
> Title:
>   Cannot add more than one unit on openstack
>
> Status in rabbitmq-server package in Juju Charms Collection:
>   Incomplete
>
> Bug description:
>   This is possibly specific to the Canonical PS4 openstack environment
>   (or specifically, the dns resolution there), but it may be better if
>   rabbitmq defaulted to IP addresses for RABBITMQ_NODENAME.
>
>   Steps to reproduce:
>    1) Deploy rabbitmq-server into a fresh openstack environment
>    2) Add a second unit
>    3) Wait and check the status.
>
>   Expected result: new node has joined the cluster.
>   Actual result: cluster-relation-changed error because an invalid hostname of a single number has been set (eg. 'ERROR: epmd error for host "6": badarg (unknown POSIX error')
>
>   I initially saw it with r82 of the charm - so reproduced it again just
>   now there [1] (example plus full details of error), but also checked
>   and verified the same issue with r99 [2]
>
>   pjdc says that the bad hostname is likely from the reverse dns,
>
>   [1] http://paste.ubuntu.com/10958485/
>   [2] http://paste.ubuntu.com/10958620/
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charms/+source/rabbitmq-server/+bug/1450699/+subscriptions

Revision history for this message

James Westby (james-w) wrote on 2015-06-12:

#6

It seems to me that using fqdn=True would be right here if rabbit expects
to be able to resolve it as an address. I don't think it's universally true that
if you RDNS an IP to "foo.something" that "foo" will resolve back to the
same machine, as evidenced here.

This comment on an old bug report about this issue suggests that there may
be some concerns about changing this value for existing instances:

https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/653405/comments/22

Haw Loeung (hloeung) on 2015-06-18

Changed in rabbitmq-server (Juju Charms Collection):
status:	Incomplete → Confirmed

Revision history for this message

Liam Young (gnuoy) wrote on 2017-01-04:

#7

Is this still an issue for you? dames has done work to untangle some of the clustering mess in the charm and as part of that the setting of RABBITMQ_NODENAME was removed as this is only pertinent for deploying multiple rabbits on the same server.

I've deployed the latest rabbit charm, using the Openstack provider, then scaled it out and it seems to be fine.

⟫ juju status
Model Controller Cloud/Region Version
rabbit dev serverstack/serverstack 2.0.2

App Version Status Scale Charm Store Rev OS Notes
rabbitmq-server 3.5.7 active 2 rabbitmq-server jujucharms 57 ubuntu

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0* active idle 0 10.5.35.94 5672/tcp Unit is ready and clustered
rabbitmq-server/1 active idle 1 10.5.35.95 5672/tcp Unit is ready and clustered

Machine State DNS Inst id Series AZ
0 started 10.5.35.94 a6ee435b-2225-40ef-ab5d-844c4f5946fa xenial nova
1 started 10.5.35.95 438c9848-e77e-4788-ba1d-f43f42e869cf xenial nova

Relation Provides Consumes Type
cluster rabbitmq-server rabbitmq-server peer

⟫ juju run --application rabbitmq-server "sudo rabbitmqctl cluster_status"
- Stdout: |
    Cluster status of node 'rabbit@juju-afa48f-rabbit-0' ...
    [{nodes,[{disc,['rabbit@juju-afa48f-rabbit-0',
                    'rabbit@juju-afa48f-rabbit-1']}]},
     {running_nodes,['rabbit@juju-afa48f-rabbit-1','rabbit@juju-afa48f-rabbit-0']},
     {cluster_name,<<"rabbit@juju-afa48f-rabbit-0">>},
     {partitions,[]}]
  UnitId: rabbitmq-server/0
- Stdout: |
    Cluster status of node 'rabbit@juju-afa48f-rabbit-1' ...
    [{nodes,[{disc,['rabbit@juju-afa48f-rabbit-0',
                    'rabbit@juju-afa48f-rabbit-1']}]},
     {running_nodes,['rabbit@juju-afa48f-rabbit-0','rabbit@juju-afa48f-rabbit-1']},
     {cluster_name,<<"rabbit@juju-afa48f-rabbit-0">>},
     {partitions,[]}]
  UnitId: rabbitmq-server/1

Is this still an issue for you? dames has done work to untangle some of the clustering mess in the charm and as part of that the setting of RABBITMQ_NODENAME was removed as this is only pertinent for deploying multiple rabbits on the same server.
 
I've deployed the latest rabbit charm, using the Openstack provider, then scaled it out and it seems to be fine.

⟫ juju status
Model   Controller  Cloud/Region             Version
rabbit  dev         serverstack/serverstack  2.0.2

App              Version  Status  Scale  Charm            Store       Rev  OS      Notes
rabbitmq-server  3.5.7    active      2  rabbitmq-server  jujucharms   57  ubuntu

Unit                Workload  Agent  Machine  Public address  Ports     Message
rabbitmq-server/0*  active    idle   0        10.5.35.94      5672/tcp  Unit is ready and clustered
rabbitmq-server/1   active    idle   1        10.5.35.95      5672/tcp  Unit is ready and clustered

Machine  State    DNS         Inst id                               Series  AZ
0        started  10.5.35.94  a6ee435b-2225-40ef-ab5d-844c4f5946fa  xenial  nova
1        started  10.5.35.95  438c9848-e77e-4788-ba1d-f43f42e869cf  xenial  nova

Relation  Provides         Consumes         Type
cluster   rabbitmq-server  rabbitmq-server  peer

⟫ juju run --application rabbitmq-server "sudo rabbitmqctl cluster_status"
- Stdout: |
    Cluster status of node 'rabbit@juju-afa48f-rabbit-0' ...
    [{nodes,[{disc,['rabbit@juju-afa48f-rabbit-0',
                    'rabbit@juju-afa48f-rabbit-1']}]},
     {running_nodes,['rabbit@juju-afa48f-rabbit-1','rabbit@juju-afa48f-rabbit-0']},
     {cluster_name,<<"rabbit@juju-afa48f-rabbit-0">>},
     {partitions,[]}]
  UnitId: rabbitmq-server/0
- Stdout: |
    Cluster status of node 'rabbit@juju-afa48f-rabbit-1' ...
    [{nodes,[{disc,['rabbit@juju-afa48f-rabbit-0',
                    'rabbit@juju-afa48f-rabbit-1']}]},
     {running_nodes,['rabbit@juju-afa48f-rabbit-0','rabbit@juju-afa48f-rabbit-1']},
     {cluster_name,<<"rabbit@juju-afa48f-rabbit-0">>},
     {partitions,[]}]
  UnitId: rabbitmq-server/1

Changed in rabbitmq-server (Juju Charms Collection):
importance:	Undecided → Medium
status:	Confirmed → Incomplete

James Page (james-page) on 2017-02-23

Changed in charm-rabbitmq-server:
importance:	Undecided → Medium
status:	New → Incomplete
Changed in rabbitmq-server (Juju Charms Collection):
status:	Incomplete → Invalid

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-04-25:

#8

[Expired for OpenStack rabbitmq-server charm because there has been no activity for 60 days.]

Changed in charm-rabbitmq-server:
status:	Incomplete → Expired

OpenStack RabbitMQ Server Charm

Cannot add more than one unit on openstack

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	OpenStack RabbitMQ Server Charm	Expired	Medium	Unassigned
	rabbitmq-server (Juju Charms Collection)	Invalid	Medium	Unassigned