Bug #1855935 “Kolla-ansible deploy fails at rabbitmq” : Bugs : kolla-ansible

Alex Jackson (xelaot) on 2019-12-10

tags:	added: kolla-ansible rabbitmq
tags:	added: stein train

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-12-11:

#1

You are most likely hit by: https://bugs.launchpad.net/kolla-ansible/+bug/1853578

Could you verify that?

I don't understand what you mean by [ip address of neutron network_interface], neutron external interface has generally no address set.

As a side note, please upgrade the host to avoid other issues due to old kernel, the releases you mentioned are tested against bionic, not xenial (the release in images as well).

Changed in kolla-ansible:
status:	New → Incomplete

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-12-11:

#2

Also:

is this MAAS?

Did you run prechecks and did they pass?

Could you include the output of:
cat /etc/hosts

getent hosts $(hostname)
getent hosts $(hostname -s)
getent hosts $(getent hosts $(hostname))
getent hosts $(getent hosts $(hostname -s))

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-12-11:

#3

(output on deployed node, not deployer)

Revision history for this message

Taisto Qvist (theque42) wrote on 2019-12-11:

#4

I think I might be the second "someone else also has this problem", where I got the question if I was using MAAS. (in the ask-question)
The answer is no. This is a private setup to a bunch of preconfigured servers with already working hosts files.

The hosts files starts of as:
-----
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.103.100 ctrl1.lab3.stack ctrl1
172.16.103.100 controller1.lab3.stack controller1
172.16.103.109 ctrl2.lab3.stack ctrl2
172.16.103.109 controller2.lab3.stack controller2
172.16.103.101 compute1.lab3.stack compute1
172.16.103.102 compute2.lab3.stack compute2
172.16.103.103 compute3.lab3.stack compute3
172.16.103.104 neutron1.lab3.stack neutron1
172.16.103.105 neutron2.lab3.stack neutron2
172.16.103.106 storage1.lab3.stack storage1
172.16.103.107 storage2.lab3.stack storage2
172.16.103.108 storage3.lab3.stack storage3
172.16.103.111 int.lab3.stack int haint lbi
10.10.103.111 ext.lab3.stack ext haext lbx
-----
but after deploy it becomes:
-----
127.0.0.1 localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.103.100 ctrl1.lab3.stack ctrl1
172.16.103.100 controller1.lab3.stack controller1
172.16.103.109 ctrl2.lab3.stack ctrl2
172.16.103.109 controller2.lab3.stack controller2
172.16.103.101 compute1.lab3.stack compute1
172.16.103.102 compute2.lab3.stack compute2
172.16.103.103 compute3.lab3.stack compute3
172.16.103.104 neutron1.lab3.stack neutron1
172.16.103.105 neutron2.lab3.stack neutron2
172.16.103.106 storage1.lab3.stack storage1
172.16.103.107 storage2.lab3.stack storage2
172.16.103.108 storage3.lab3.stack storage3
# BEGIN ANSIBLE GENERATED HOSTS
172.16.103.100 ctrl1.lab3.stack ctrl1
172.16.103.109 ctrl2.lab3.stack ctrl2
172.16.103.104 neutron1.lab3.stack neutron1
172.16.103.105 neutron2.lab3.stack neutron2
172.16.103.101 compute1.lab3.stack compute1
172.16.103.102 compute2.lab3.stack compute2
172.16.103.106 storage1.lab3.stack storage1
172.16.103.107 storage2.lab3.stack storage2
# END ANSIBLE GENERATED HOSTS
-----

I've come to realize that its stupid/wrong/etc to have multiple lines with the same IP in hosts, but since kolla simply appends, I'd get the same error even if I clean up that issue.
(I will though, and see if that helps)

I think I might be the second "someone else also has this problem", where I got the question if I was using MAAS. (in the ask-question)
The answer is no. This is a private setup to a bunch of preconfigured servers with already working hosts files.

The hosts files starts of as:
-----
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.103.100    ctrl1.lab3.stack ctrl1
172.16.103.100    controller1.lab3.stack controller1
172.16.103.109    ctrl2.lab3.stack ctrl2
172.16.103.109    controller2.lab3.stack controller2
172.16.103.101    compute1.lab3.stack compute1
172.16.103.102    compute2.lab3.stack compute2
172.16.103.103    compute3.lab3.stack compute3
172.16.103.104    neutron1.lab3.stack neutron1
172.16.103.105    neutron2.lab3.stack neutron2
172.16.103.106    storage1.lab3.stack storage1
172.16.103.107    storage2.lab3.stack storage2
172.16.103.108    storage3.lab3.stack storage3
172.16.103.111    int.lab3.stack int haint lbi
10.10.103.111    ext.lab3.stack ext haext lbx
-----
but after deploy it becomes:
-----
127.0.0.1 localhost
::1       localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.103.100    ctrl1.lab3.stack ctrl1
172.16.103.100    controller1.lab3.stack controller1
172.16.103.109    ctrl2.lab3.stack ctrl2
172.16.103.109    controller2.lab3.stack controller2
172.16.103.101    compute1.lab3.stack compute1
172.16.103.102    compute2.lab3.stack compute2
172.16.103.103    compute3.lab3.stack compute3
172.16.103.104    neutron1.lab3.stack neutron1
172.16.103.105    neutron2.lab3.stack neutron2
172.16.103.106    storage1.lab3.stack storage1
172.16.103.107    storage2.lab3.stack storage2
172.16.103.108    storage3.lab3.stack storage3
# BEGIN ANSIBLE GENERATED HOSTS
172.16.103.100 ctrl1.lab3.stack ctrl1
172.16.103.109 ctrl2.lab3.stack ctrl2
172.16.103.104 neutron1.lab3.stack neutron1
172.16.103.105 neutron2.lab3.stack neutron2
172.16.103.101 compute1.lab3.stack compute1
172.16.103.102 compute2.lab3.stack compute2
172.16.103.106 storage1.lab3.stack storage1
172.16.103.107 storage2.lab3.stack storage2
# END ANSIBLE GENERATED HOSTS
-----

I've come to realize that its stupid/wrong/etc to have multiple lines with the same IP in hosts, but since kolla simply appends, I'd get the same error even if I clean up that issue.
(I will though, and see if that helps)

Revision history for this message

Alex Jackson (xelaot) wrote on 2019-12-11:

#5

Download full text (3.8 KiB)

"Could you verify that?"

Just like above, I am in a private environment and I also am running an all-in-one deployment so I have not touched /etc/hosts

Either way, here's the /etc/hosts file:
127.0.0.1 localhost
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# BEGIN ANSIBLE GENERATED HOSTS
10.111.203.5 openStack
# END ANSIBLE GENERATED HOSTS

"I don't understand what you mean by [ip address of neutron network_interface]"

In the globals.yml file for kolla-ansible, there's a section to set the name of the network interface used for api services. This interface must contain an ip address.

Here's the globals.yml explanation followed by it's configuration:

# This interface is what all your api services will be bound to by default.
# Additionally, all vxlan/tunnel and storage network traffic will go over this
# interface by default. This interface must contain an IP address.
# It is possible for hosts to have non-matching names of interfaces - these can
# be set in an inventory file per host or per group or stored separately, see
# http://docs.ansible.com/ansible/intro_inventory.html
# Yet another way to workaround the naming problem is to create a bond for the
# interface on all hosts and give the bond name here. Similar strategy can be
# followed for other types of interfaces.
network_interface: "eno1"

This interface has the ip address of 10.111.203.5 which I substitute in for the ERL_EMPD_ADDRESS within the ./kolla-ansible/ansible/roles/rabbitmq/templates/rabbitmq-env.conf.j2 to hardcode the value since jinja or ansible is not substituting the address in the rabbitmq docker image.

"is this MAAS?"

Nope, just a single machine

output of those various commands:

getent hosts $(hostname)
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack.wsfdindl.metronetinc.net openStack openStack

getent hosts $(hostname -s)
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack.wsfdindl.metronetinc.net openStack openStack

getent hosts $(getent hosts $(hostname))
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack.wsfdindl.metronetinc.net openStack openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack.wsfdindl.metronetinc.net openStack openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5 openStack.wsfdindl.metronetinc.net openStack openStack

getent hosts $(getent hosts $(hostname -s))
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
127.0.1.1...

"Could you verify that?"

Just like above, I am in a private environment and I also am running an all-in-one deployment so I have not touched /etc/hosts

Either way, here's the /etc/hosts file:
127.0.0.1 localhost
127.0.1.1       openStack.wsfdindl.metronetinc.net      openStack

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# BEGIN ANSIBLE GENERATED HOSTS
10.111.203.5 openStack
# END ANSIBLE GENERATED HOSTS

"I don't understand what you mean by [ip address of neutron network_interface]"

In the globals.yml file for kolla-ansible, there's a section to set the name of the network interface used for api services. This interface must contain an ip address.

Here's the globals.yml explanation followed by it's configuration:

# This interface is what all your api services will be bound to by default.
# Additionally, all vxlan/tunnel and storage network traffic will go over this
# interface by default. This interface must contain an IP address.
# It is possible for hosts to have non-matching names of interfaces - these can
# be set in an inventory file per host or per group or stored separately, see
#     http://docs.ansible.com/ansible/intro_inventory.html
# Yet another way to workaround the naming problem is to create a bond for the
# interface on all hosts and give the bond name here. Similar strategy can be
# followed for other types of interfaces.
network_interface: "eno1"

This interface has the ip address of 10.111.203.5 which I substitute in for the ERL_EMPD_ADDRESS within the ./kolla-ansible/ansible/roles/rabbitmq/templates/rabbitmq-env.conf.j2 to hardcode the value since jinja or ansible is not substituting the address in the rabbitmq docker image.

"is this MAAS?"

Nope, just a single machine

output of those various commands:

getent hosts $(hostname)
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack

getent hosts $(hostname -s)
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack

getent hosts $(getent hosts $(hostname))
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack

getent hosts $(getent hosts $(hostname -s))
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack
127.0.1.1       openStack.wsfdindl.metronetinc.net openStack openStack
10.111.203.5    openStack.wsfdindl.metronetinc.net openStack openStack

NOTE: all of these commands were run with openstack deployed

Revision history for this message

Taisto Qvist (theque42) wrote on 2019-12-11:

#6

I tried cleaning out the bad, duplicate line in etc hosts (with the spelled out controller-name), and that didnt help, maybe because I had duplicates anyway thanks to kolla-ansible simply appending to the file.

I didnt mention it, but I hope it was obvious from my hosts that I am running a multinode install, and this hosts file has worked in mitaka and rocky...dont remember if I had time to try stein.

I lost the output from my getent-calls, since I restarted the cloud deploy, but I did notice that similar to above, I also got multiple entries for the same address. In my case ctrl1.lab1.stack.

Radosław Piliszek (yoctozepto) on 2019-12-12

Changed in kolla-ansible:
status:	Incomplete → New

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-12-12:

#7

Well, it might be the case that you are running into two different issues...
The bad news is I tried to reproduce both and could not either. :-)

In the xelaot's case, the breaking line is:
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
and we remove that since https://review.opendev.org/685233
(also backported to stein).
I don't see how hardcoding would help since both ways it templates out to the very same file...
Also, if api_interface_address failed to template out, there would be total havoc, not only in rmq.

theque42 - exact duplicates change nothing because then all programs tend to behave the same irrespective of the resolver being used (unless some very nasty one, but doubt it would be erlang).
Since your issue is a different one, could you create a separate bug report?
Might be easier to coordinate this way.

xelaot, please try using latest stein commit and see whether the bad line stays in the file, and what is the result of "Ensure hostname does not point to loopback in /etc/hosts" task (it is run by the bootstrap).

Revision history for this message

Taisto Qvist (theque42) wrote on 2019-12-12:

#8

I'll try to create a new case later today, but I just wanna let you know that I reran today, with the following setup, and with the same problem.

[root@ctrl1 ~(admin)]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

172.16.102.100 ctrl1.lab2.stack controller1 ctrl1
172.16.102.109 ctrl2.lab2.stack controller2 ctrl2
172.16.102.101 compute1.lab2.stack compute1
172.16.102.102 compute2.lab2.stack compute2
172.16.102.103 compute3.lab2.stack compute3
172.16.102.104 neutron1.lab2.stack neutron1
172.16.102.105 neutron2.lab2.stack neutron2
172.16.102.106 storage1.lab2.stack storage1
172.16.102.107 storage2.lab2.stack storage2
172.16.102.108 storage3.lab2.stack storage3
172.16.102.111 int.lab2.stack int haint lbi
10.10.102.111 ext.lab2.stack ext haext lbx

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(hostname)
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(hostname -s)
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(getent hosts $(hostname))
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(getent hosts $(hostname -s))^C

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(hostname)
172.16.102.100 ctrl1.lab2.stack controller1 ctrl1

[root@ctrl1 ~(admin)]# docker logs rabbitmq 2>&1| tail -10
++ [[ -n '' ]]
++ [[ ! -d /var/log/kolla/rabbitmq ]]
+++ stat -c %a /var/log/kolla/rabbitmq
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/rabbitmq
Running command: '/usr/sbin/rabbitmq-server'
+ echo 'Running command: '\''/usr/sbin/rabbitmq-server'\'''
+ exec /usr/sbin/rabbitmq-server
econnrefused

I'll try to create a new case later today, but I just wanna let you know that I reran today, with the following setup, and with the same problem.

[root@ctrl1 ~(admin)]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.16.102.100    ctrl1.lab2.stack controller1 ctrl1
172.16.102.109    ctrl2.lab2.stack controller2 ctrl2
172.16.102.101    compute1.lab2.stack compute1
172.16.102.102    compute2.lab2.stack compute2
172.16.102.103    compute3.lab2.stack compute3
172.16.102.104    neutron1.lab2.stack neutron1
172.16.102.105    neutron2.lab2.stack neutron2
172.16.102.106    storage1.lab2.stack storage1
172.16.102.107    storage2.lab2.stack storage2
172.16.102.108    storage3.lab2.stack storage3
172.16.102.111    int.lab2.stack int haint lbi
10.10.102.111    ext.lab2.stack ext haext lbx

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(hostname)
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(hostname -s)
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(getent hosts $(hostname))
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(getent hosts $(hostname -s))^C

[lab2]:admin@admin
[root@ctrl1 ~(admin)]# getent hosts $(hostname)
172.16.102.100  ctrl1.lab2.stack controller1 ctrl1

[root@ctrl1 ~(admin)]# docker logs rabbitmq 2>&1|  tail -10
++ [[ -n '' ]]
++ [[ ! -d /var/log/kolla/rabbitmq ]]
+++ stat -c %a /var/log/kolla/rabbitmq
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/rabbitmq
Running command: '/usr/sbin/rabbitmq-server'
+ echo 'Running command: '\''/usr/sbin/rabbitmq-server'\'''
+ exec /usr/sbin/rabbitmq-server
econnrefused

Revision history for this message

Taisto Qvist (theque42) wrote on 2019-12-13:

#9

Created: #1856281

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-12-15:

#10

Reviving the original issue:

In the xelaot's case, the breaking line is:
127.0.1.1 openStack.wsfdindl.metronetinc.net openStack
and we remove that since https://review.opendev.org/685233
(also backported to stein).
I don't see how hardcoding would help since both ways it templates out to the very same file...
Also, if api_interface_address failed to template out, there would be total havoc, not only in rmq.

xelaot, please try using latest stein commit and see whether the bad line stays in the file, and what is the result of "Ensure hostname does not point to loopback in /etc/hosts" task (it is run by the bootstrap).

Changed in kolla-ansible:
status:	New → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-02-14:

#11

[Expired for kolla-ansible because there has been no activity for 60 days.]

Changed in kolla-ansible:
status:	Incomplete → Expired

Revision history for this message

Magnus Lööf (magnus-loof) wrote on 2020-03-31:

#12

This seems to be related to the fact that when deploying RabbitMQ, it needs the following sysctl:

```
- net.ipv4.ip_nonlocal_bind: 1
- net.ipv6.ip_nonlocal_bind: 1
```

those are set in the HAProxy deployment - but if you separated Control and Network nodes you would run into this problem.

Revision history for this message

Lianhao Lu (lianhao-lu) wrote on 2020-03-31:

#13

@Magnus Lööf,

Thanks your workaround works. After apply those sysctl, it works.

I met the same issue with kolla-ansible 9.0.1 with kolla/ubuntu-source-rabbitmq:train keeps restarting in my multinode environment. I've checked that the /etc/hosts are all clean.

Magnus Lööf (magnus-loof) on 2020-03-31

Changed in kolla-ansible:
status:	Expired → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-31: Fix proposed to kolla-ansible (master)

#14

Fix proposed to branch: master
Review: https://review.opendev.org/716207

Changed in kolla-ansible:
assignee:	nobody → Magnus Lööf (magnus-loof)
status:	Confirmed → In Progress

Revision history for this message

Magnus Lööf (magnus-loof) wrote on 2020-04-01:

#15

OK so here is some testing:

With `# export ERL_EPMD_ADDRESS=10.99.30.13` in `rabbitmq-env.conf`:

```
sudo ss -tlnp | grep 4369
LISTEN 0 128 *:4369 *:* users:(("epmd",pid=12391,fd=3))
LISTEN 0 128 :::4369 :::* users:(("epmd",pid=12391,fd=4))
```

With `export ERL_EPMD_ADDRESS=10.99.30.13` in `rabbitmq-env.conf`:

```
sudo docker logs rabbitmq
...
+ echo 'Running command: '\''/usr/sbin/rabbitmq-server'\'''
+ exec /usr/sbin/rabbitmq-server
econnrefused
```

With `sysctl net.ipv6.ip_nonlocal_bind=1` and `sysctl net.ipv4.ip_nonlocal_bind=1`

```
sudo ss -tlnp | grep 4369
LISTEN 0 128 10.99.30.13:4369 *:* users:(("epmd",pid=15886,fd=5))
LISTEN 0 128 127.0.0.1:4369 *:* users:(("epmd",pid=15886,fd=3))
LISTEN 0 128 ::1:4369 :::* users:(("epmd",pid=15886,fd=4))
```

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-04-01:

#16

Does it need both sysctls?

Revision history for this message

Magnus Lööf (magnus-loof) wrote on 2020-04-01:

#17

I did not try with only ipv6, but it did not work with only ipv4.

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-04-01:

#18

Please do try it with IPv6 only.

Revision history for this message

Magnus Lööf (magnus-loof) wrote on 2020-04-02:

#19

It was only `sysctl net.ipv6.ip_nonlocal_bind=1` that was required.

Checking some more, I found that

```
- net.ipv6.conf.all.disable_ipv6: 1
- net.ipv6.conf.default.disable_ipv6: 1
```

were also set. Setting those to `0` made it work without either `nonlocal_bind`

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-04-02:

#20

Ah, splendid. Just like discussed in the bug marked as duplicate now.

I was wondering there if rmq does not try to bind to 'localhost'-resolved addresses.
Could you make sure localhost points only to 127.0.0.1 in /etc/hosts and retry the failure scenario?

Revision history for this message

Magnus Lööf (magnus-loof) wrote on 2020-04-02:

#21

Every 2,0s: cat /etc/hosts Thu Apr 2 17:51:17 2020

127.0.0.1 localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

# BEGIN ANSIBLE GENERATED HOSTS
10.99.30.7 n-25484-bpc1 n-25484-bpc1.vms.basalt.se
10.99.30.6 n-25484-bpc2 n-25484-bpc2.vms.basalt.se
10.99.30.16 n-25484-bpc3 n-25484-bpc3.vms.basalt.se
10.99.30.5 n-25484-bpc4 n-25484-bpc4.vms.basalt.se
# END ANSIBLE GENERATED HOSTS

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-04-02:

#22

Thanks, now try removing that ::1 line

Radosław Piliszek (yoctozepto) on 2020-06-02

Changed in kolla-ansible:
assignee:	Magnus Lööf (magnus-loof) → Radosław Piliszek (yoctozepto)
importance:	Undecided → Low

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-06-02:

#23

EPMD docs say it will listen on localhost and addresses specified: https://erlang.org/doc/man/epmd.html

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-06-14:

#24

OK, I validated that modifying /etc/hosts is of no help here but it seems that one should generally never disable IPv6 on the loopback (lo) as many programs may depend on its presence whenever IPv6 is enabled. If you want to disable IPv6, then please do it via ipv6.disable=1 on kernel cmdline so that it never registers its AF and all IPv6-enabled software is aware not to try working with IPv6 sockets (rmq included here).
What kolla-ansible can do is to add a precheck that validates whether this is really the case.

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2021-08-13:

#25

Invalidating because the issue was with broken IPv6 stack. Please don't break your IPv6 or bad things may happen (TM).

Changed in kolla-ansible:
status:	In Progress → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-12-08: Change abandoned on kolla-ansible (master)

#26

Change abandoned by "Magnus Lööf <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/716207
Reason: Bug invalid

kolla-ansible

Kolla-ansible deploy fails at rabbitmq

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches