juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it.

Bug #1574844 reported by David Britton
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Critical
Richard Harding
rabbitmq-server (Juju Charms Collection)
Fix Released
High
James Page

Bug Description

Text of all this with formatting: http://paste.ubuntu.com/16055928/

14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:df:5c:af brd ff:ff:ff:ff:ff:ff
    inet 10.73.33.31/24 brd 10.73.33.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd51:f9f5:d5f:860b:216:3eff:fedf:5caf/64 scope global dynamic
       valid_lft 3432sec preferred_lft 3432sec
    inet6 fe80::216:3eff:fedf:5caf/64 scope link
       valid_lft forever preferred_lft forever

dpb@helo:~[0]$ juju status
[Services]
NAME STATUS EXPOSED CHARM
haproxy unknown true cs:trusty/haproxy-16
landscape-server unknown false cs:trusty/landscape-server-14
postgresql active false cs:trusty/postgresql-40
rabbitmq-server error false cs:trusty/rabbitmq-server-43

[Relations]
SERVICE1 SERVICE2 RELATION TYPE
haproxy haproxy peer peer
haproxy landscape-server website regular
landscape-server postgresql db-admin regular
landscape-server rabbitmq-server amqp regular
postgresql postgresql replication peer
rabbitmq-server rabbitmq-server cluster peer

[Units]
ID WORKLOAD-STATUS JUJU-STATUS VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
haproxy/0 unknown idle 2.0-beta5 0 80/tcp,443/tcp,10000/tcp 10.73.33.146
landscape-server/0 unknown idle 2.0-beta5 1 10.73.33.163
postgresql/0 active idle 2.0-beta5 2 5432/tcp 10.73.33.112 Live master
rabbitmq-server/0 error idle 2.0-beta5 3 fd51:f9f5:d5f:860b:216:3eff:fedf:5caf hook failed: "config-changed"

[Machines]
ID STATE DNS INS-ID SERIES AZ
0 started 10.73.33.146 juju-85e8600d-a1a6-4bc1-8f49-b4536d6a9825-machine-0 trusty
1 started 10.73.33.163 juju-85e8600d-a1a6-4bc1-8f49-b4536d6a9825-machine-1 trusty
2 started 10.73.33.112 juju-85e8600d-a1a6-4bc1-8f49-b4536d6a9825-machine-2 trusty
3 started fd51:f9f5:d5f:860b:216:3eff:fedf:5caf juju-85e8600d-a1a6-4bc1-8f49-b4536d6a9825-machine-3 trusty

Then the error in rabbit:

2016-04-25 20:06:25 INFO config-changed python-amqplib is already the newest version.
2016-04-25 20:06:25 INFO config-changed rabbitmq-server is already the newest version.
2016-04-25 20:06:25 INFO config-changed The following packages were automatically installed and are no longer required:
2016-04-25 20:06:25 INFO config-changed libfreetype6 os-prober
2016-04-25 20:06:25 INFO config-changed Use 'apt-get autoremove' to remove them.
2016-04-25 20:06:25 INFO config-changed 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
2016-04-25 20:06:25 INFO worker.uniter.jujuc server.go:173 running hook tool "open-port" ["5672/TCP"]
2016-04-25 20:06:25 INFO worker.uniter.jujuc server.go:173 running hook tool "juju-log" ["Changing ownership of path /var/lib/rabbitmq to rabbitmq:rabbitmq"]
2016-04-25 20:06:25 INFO juju-log Changing ownership of path /var/lib/rabbitmq to rabbitmq:rabbitmq
2016-04-25 20:06:25 INFO worker.uniter.jujuc server.go:173 running hook tool "juju-log" ["Changing perms of path /var/lib/rabbitmq "]
2016-04-25 20:06:25 INFO juju-log Changing perms of path /var/lib/rabbitmq
2016-04-25 20:06:25 INFO worker.uniter.jujuc server.go:173 running hook tool "unit-get" ["--format=json" "private-address"]
2016-04-25 20:06:25 INFO config-changed Traceback (most recent call last):
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/config-changed", line 724, in <module>
2016-04-25 20:06:25 INFO config-changed hooks.execute(sys.argv)
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/charmhelpers/core/hookenv.py", line 717, in execute
2016-04-25 20:06:25 INFO config-changed self._hooks[hook_name]()
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/rabbit_utils.py", line 734, in wrapped_f
2016-04-25 20:06:25 INFO config-changed f(*args, **kwargs)
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/config-changed", line 657, in config_changed
2016-04-25 20:06:25 INFO config-changed configure_nodename()
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/config-changed", line 113, in configure_nodename
2016-04-25 20:06:25 INFO config-changed nodename = rabbit.get_local_nodename()
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/rabbit_utils.py", line 672, in get_local_nodename
2016-04-25 20:06:25 INFO config-changed ip_addr = get_host_ip(unit_get('private-address'))
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 417, in get_host_ip
2016-04-25 20:06:25 INFO config-changed ip_addr = ns_query(hostname)
2016-04-25 20:06:25 INFO config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 403, in ns_query
2016-04-25 20:06:25 INFO config-changed answers = dns.resolver.query(address, rtype)
2016-04-25 20:06:25 INFO config-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 974, in query
2016-04-25 20:06:25 INFO config-changed raise_on_no_answer, source_port)
2016-04-25 20:06:25 INFO config-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 903, in query
2016-04-25 20:06:25 INFO config-changed raise NXDOMAIN
2016-04-25 20:06:25 INFO config-changed dns.resolver.NXDOMAIN
2016-04-25 20:06:25 ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
2016-04-25 20:06:25 INFO juju.worker.uniter resolver.go:107 awaiting error resolution for "config-changed" hook
2016-04-25 20:06:35 INFO juju.worker.leadership tracker.go:182 rabbitmq-server/0 will renew rabbitmq-server leadership at 2016-04-25 20:07:05.577774247 +0000 UTC
2016-04-25 20:07:05 INFO juju.worker.leadership tracker.go:182 rabbitmq-server/0 will renew rabbitmq-server leadership at 2016-04-25 20:07:35.578009654 +0000 UTC
2016-04-25 20:07:35 INFO juju.worker.leadership tracker.go:182 rabbitmq-server/0 will renew rabbitmq-server leadership at 2016-04-25 20:08:05.578249258 +0000 UTC

Related branches

David Britton (dpb)
tags: added: landscape
description: updated
Revision history for this message
David Britton (dpb) wrote :

Juju debug-log file

Revision history for this message
David Britton (dpb) wrote :

Note, all the machines have ipv6 and ipv4 addresses, but machine 3 is somehow different. It's ipv6 address shows up in juju status and interferes with rabbitmq-server (which is ultra-sensitive to hostname/ip issues)

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I can see in the log that the machine does have an IPv4 and IPv6 address:

machine-3: 2016-04-25 16:38:35 INFO juju.worker.machiner machiner.go:142 setting addresses for machine-3 to ["local-machine:127.0.0.1" "local-cloud:10.73.33.31" "local-machine:::1" "local-cloud:fd51:f9f5:d5f:860b:216:3eff:fedf:5caf"]

I don't know why juju is selecting the IPv6 one as the public IP address.

tags: added: juju-release-support lxd-provider
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-beta7
Revision history for this message
David Britton (dpb) wrote :

Note -- I seem to be able to repro this by just doing

juju deploy ubuntu -n 15

on the lxd local provider. one of the machines usually shows an ipv6, while the others show ipv4.

Revision history for this message
Adam Stokes (adam-stokes) wrote :

You can easily reproduce this with conjure-up as well, usigin juju beta6

conjure-up openstack

Things like neutron, rabbitmq will pull an ipv6 address. The workaround is disabling ipv6 all together:

https://askubuntu.com/questions/440649/how-to-disable-ipv6-in-ubuntu-14-04

tags: added: conjure
Revision history for this message
Adam Stokes (adam-stokes) wrote :

This is a huge issue for us as many more people are starting to use `conjure-up`. Just about 9 out of 10 run into this.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I did manage to reproduce the issue on 2.0 (master tip) on lxd w/ xenial, as suggested in #4, twice.

Unfortunately, I'm still looking for the root cause of this, and thus far it seems more related to lxd-bridge and/or cloud-init, rather than juju.

I can see a dhclient process appearing stuck in some of the containers (and only the init process is there, but otherwise the lxd container appears accessible and able to resolve DNS queries and reach the internet via the lxdbr0 nat settings).

For those containers (in both tests it was always 1 of 15 - and it was #12 in the first and #13 in the second test that had that issue). No observable differences otherwise in system config, logs, etc. between the container with the ipv6 address and all others).

It's interesting that with add-unit ubuntu -n 30 (after the deploy with -n 15) most of the new containers get stuck in pending with virtually only that dhclient process hanging, e.g. http://paste.ubuntu.com/16318617/

A possible workaround will be to restart the jujud process on the container appearing with an ipv6 address (NOTE: lxc list actually shows all containers RUNNING and having *both* IPv4 and IPv6 addresses). Will dig in some more..

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I'm looking into this again today, as I was focusing on other critical issues. Will post updates as I go.

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I can confirm the issue most commonly happens with the first or second machine in any given model, including the admin (controller) model, provided there is some delay between adding the model and deploying 2 units e.g.

juju add-model foo && sleep 3 && juju deploy -m foo ubuntu -n 2

is usually (80% of the time) sufficient to reproduce the issue consistently for machine-1 in the model foo.

Also, watching `lxc list juju-<model-uuid-prefix>` as containers start I can see 100% of the time when the issue appears, the IPv6 address of the container appears before the IPv4 address. After a few seconds, both addresses appear.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta7 → 2.0-beta8
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I can confirm the root cause. The IPv6 address for the containers with the issue comes up first from the lxd provider (and the only other addresses at this point are loopback addresses), and it is set permanently as both "preferredPrivate" and "preferredPublicAddress" of the machine. Working on a fix.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

The correct fix should store preferred private and public IPv4 and IPv6 addresses separately for any given machine, rather than having only one preferred private or public address per IP version.

I'm working on a PR to introduce the described behavior.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Fix proposed for preliminary review: https://github.com/juju/juju/pull/5435 (needs a few extra tests I'm working on still, but live tests work as expected).

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Another simpler, less intrusive, and more focused fix proposed: https://github.com/juju/juju/pull/5471

James Page (james-page)
Changed in rabbitmq-server (Juju Charms Collection):
assignee: nobody → James Page (james-page)
status: New → In Progress
importance: Undecided → High
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Closing this as the underlying issue is with the rabbitmq-charm and charm-helpers not supporting IPv6 properly, not the upstream rabbitmq. Also James Page has a fix in progress for the IPv6 issue. More context can be found in bug 1584902.

Changed in juju-core:
status: In Progress → Won't Fix
assignee: Dimiter Naydenov (dimitern) → nobody
milestone: 2.0-beta8 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)

Fix proposed to branch: master
Review: https://review.openstack.org/322035

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Verified the fix to work with Juju on LXD, 8 units of rabbitmq, one with IPv6 address, all happily clustered and working: http://paste.ubuntu.com/16728135/

James Page (james-page)
Changed in rabbitmq-server (Juju Charms Collection):
milestone: none → 16.07
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.openstack.org/322035
Committed: https://git.openstack.org/cgit/openstack/charm-rabbitmq-server/commit/?id=701fb3c7b68f16185bdf0c400a78e60067551ab3
Submitter: Jenkins
Branch: master

commit 701fb3c7b68f16185bdf0c400a78e60067551ab3
Author: James Page <email address hidden>
Date: Fri May 27 10:31:21 2016 +0100

    Resync charm helpers

    This brings in a few changes, but specifically of interest is the
    change to is_ip which correctly detects both IPv4 and IPv6 addresses,
    resolving problems when Juju presents an IPv6 based private-address
    back to the charm.

    Change-Id: I3b6391053db83c8b3f1662f010783703b1f16d0a
    Closes-Bug: 1574844

Changed in rabbitmq-server (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (stable/16.04)

Fix proposed to branch: stable/16.04
Review: https://review.openstack.org/323264

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (stable/16.04)

Reviewed: https://review.openstack.org/323264
Committed: https://git.openstack.org/cgit/openstack/charm-rabbitmq-server/commit/?id=06c8adf05c0dff40165a8e0c801cb5c35628b01f
Submitter: Jenkins
Branch: stable/16.04

commit 06c8adf05c0dff40165a8e0c801cb5c35628b01f
Author: James Page <email address hidden>
Date: Tue May 31 10:59:48 2016 +0100

    Resync stable charm helpers

    Change to is_ip to correctly detects both IPv4 and IPv6 addresses,
    resolving problems when Juju presents an IPv6 based private-address
    back to the charm.

    Change-Id: I22360c7e2338a76021d3947816ff2e8a92fe814a
    Closes-Bug: 1574844

James Page (james-page)
Changed in rabbitmq-server (Juju Charms Collection):
status: Fix Committed → Fix Released
Revision history for this message
Martin Packman (gz) wrote :

This came up again under bug 1626097, I really thing the lxd provider needs to consistently use either ipv4 or ipv6 not mix and match what it shows.

no longer affects: juju-core
tags: added: cpe-sa usability
Changed in juju:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.0-rc2
Changed in juju:
assignee: nobody → Richard Harding (rharding)
Revision history for this message
Michael Foord (mfoord) wrote :

I'm pretty sure this is the same bug as bug #1624495. PreferredPublicAddress (where dns-name comes from) getting set to an ipv6 address.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.