Relies on DNS to resolve own hostname

Bug #1538812 reported by Florian Haas on 2016-01-28
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charm Helpers
Medium
Unassigned
OpenStack ceph charm
Medium
Unassigned
OpenStack cinder charm
Medium
Unassigned
OpenStack neutron-gateway charm
Medium
Unassigned
OpenStack nova-compute charm
Medium
Unassigned
ceph (Juju Charms Collection)
Medium
Unassigned
cinder (Juju Charms Collection)
Medium
Unassigned
neutron-gateway (Juju Charms Collection)
Medium
Unassigned
nova-compute (Juju Charms Collection)
Medium
Unassigned

Bug Description

In charm/hooks/utils.py, the get_host_ip method seems to rely on DNS to resolve host names:

@cached
def get_host_ip(hostname=None):
    if config('prefer-ipv6'):
        return get_ipv6_addr()[0]

    hostname = hostname or unit_get('private-address')
    try:
        # Test to see if already an IPv4 address
        socket.inet_aton(hostname)
        return hostname
    except socket.error:
        # This may throw an NXDOMAIN exception; in which case
        # things are badly broken so just let it kill the hook
        answers = dns.resolver.query(hostname, 'A')
        if answers:
            return answers[0].address

Firstly, the dns.resolver.query call strikes me as incredibly silly. What if the other node in not resolvable via DNS, but its name is in /etc/hosts? What if the other node *is* in DNS, but is a CNAME? And, bottom line, why not simply use socket.gethostbyname() here?

Secondly, this currently (as of today) breaks a Ceph deployment. It definitely didn't do so a month ago, so whatever it was that changed in the interim, this is a regression. I don't know if this code path wasn't there earlier, or whether it was just never hit. But it definitely used to work, and no longer does.

Florian Haas (fghaas) wrote :

So, this works just fine:

@cached
def get_host_ip(hostname=None):
    if config('prefer-ipv6'):
        return get_ipv6_addr()[0]

    hostname = hostname or unit_get('private-address')
    try:
        # Test to see if already an IPv4 address
        socket.inet_aton(hostname)
        return hostname
    except socket.error:
        return socket.gethostbyname(hostname)

Florian Haas (fghaas) wrote :

If my analysis here is correct, then it doesn't just break Ceph; instead it would break all of the affected charms when hosts are not DNS-resolvable. Which never happens when using MAAS and the MAAS host acts as one's DNS server, but would probably be the case with most (all?) non-MAAS Juju providers.

James Page (james-page) wrote :

This function is used to resolve IP addresses from hostnames; I'll take a look and see what's changed recently, but this approach was put together over a few cycles of testing against MAAS and OpenStack providers and provided the most generally applicable approach to resolving an IP address.

James Page (james-page) wrote :

AFAICT this code has not changed in quite some time; so I'm not sure what's causing the change in behaviour that you are seeing.

Which provider are you using? It would be helpful to understand so we can see the full context.

Having a final attempt using gethostbyname might be a sensible improvement - from memory this might return 127.0.1.1 on quite a few of the Juju providers, but that's worth testing with to see.

James Page (james-page) wrote :

"What if the other node *is* in DNS, but is a CNAME?"

The way that query is done means that a CNAME record gets resolved down into an underlying A record automatically.

MAAS used to structure dns in this way, with the actual hostname being a CNAME that pointed to an auto-generated A record.

Florian Haas (fghaas) wrote :

The problem pops up when using the "manual" provider. I've just spun up a simplified test environment to confirm the problem; the exact Juju config is here if it helps: https://www.hastexo.com/resources/hints-and-kinks/ubuntu-openstack-juju-4-nodes/

After running that setup, cinder-volume and neutron-gateway are broken:

$ juju stat neutron-gateway --format=oneline

- neutron-gateway/0: charlie.example.com (error)

$ juju stat cinder-volume --format=oneline

- cinder-volume/0: daisy.example.com (error)

$ juju debug-log --replay | grep NXDOMAIN
unit-neutron-gateway-0[22632]: 2016-01-28 18:50:56 INFO unit.neutron-gateway/0.install logger.go:40 raise NXDOMAIN
unit-neutron-gateway-0[22632]: 2016-01-28 18:50:56 INFO unit.neutron-gateway/0.install logger.go:40 dns.resolver.NXDOMAIN
unit-cinder-volume-0[22570]: 2016-01-28 18:52:39 INFO unit.cinder-volume/0.install logger.go:40 raise NXDOMAIN
unit-cinder-volume-0[22570]: 2016-01-28 18:52:39 INFO unit.cinder-volume/0.install logger.go:40 dns.resolver.NXDOMAIN

By the way: why exactly is this just an INFO message when it's an unhandled exception that, according to the comment, "kills the hook"?

All nodes have an /etc/hosts file that makes them mutually resolvable by gethostbyname(), but they use 8.8.8.8 and 8.8.4.4 for their DNS servers, which means they obviously can't resolve each other's host names via DNS.

Florian Haas (fghaas) wrote :

Forgot to add evidence that in the test environment nova-volume is broken too.

training@deploy:~$ juju stat nova-compute --format=oneline

- nova-compute/0: bob.example.com (error)

unit-nova-compute-0[1355]: 2016-01-28 19:15:01 INFO unit.nova-compute/0.config-changed logger.go:40 raise NXDOMAIN
unit-nova-compute-0[1355]: 2016-01-28 19:15:01 INFO unit.nova-compute/0.config-changed logger.go:40 raise NXDOMAIN
unit-nova-compute-0[1355]: 2016-01-28 19:15:01 INFO unit.nova-compute/0.config-changed logger.go:40 dns.resolver.NXDOMAIN
unit-nova-compute-0[1355]: 2016-01-28 19:15:01 INFO unit.nova-compute/0.config-changed logger.go:40 dns.resolver.NXDOMAIN

James Page (james-page) on 2016-01-29
Changed in charm-helpers:
status: New → Triaged
importance: Undecided → Medium
Florian Haas (fghaas) wrote :

Just as a reference point. It appears that what this charm helper *really* wants to do is get the host's "default" IP address. Recognizing that a reverse name lookup is a rather poor way of finding that out, Ansible does an ip route get to two well-known IP addresses (v4 and v6) to determine the outgoing interface for the default route, and then returns that interface's address as ansible_default_ipv4.address, and ansible_default_ipv6.address.

See https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/facts.py#L1918 for details.

James Page (james-page) on 2016-05-26
Changed in ceph (Juju Charms Collection):
status: New → Triaged
Changed in cinder (Juju Charms Collection):
status: New → Triaged
Changed in neutron-gateway (Juju Charms Collection):
status: New → Triaged
Changed in nova-compute (Juju Charms Collection):
status: New → Triaged
Changed in ceph (Juju Charms Collection):
importance: Undecided → Medium
Changed in cinder (Juju Charms Collection):
importance: Undecided → Medium
Changed in nova-compute (Juju Charms Collection):
importance: Undecided → Medium
Changed in neutron-gateway (Juju Charms Collection):
importance: Undecided → Medium
James Page (james-page) on 2017-02-23
Changed in charm-nova-compute:
importance: Undecided → Medium
status: New → Triaged
Changed in nova-compute (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page) on 2017-02-23
Changed in charm-ceph:
importance: Undecided → Medium
status: New → Triaged
Changed in ceph (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page) on 2017-02-23
Changed in charm-cinder:
importance: Undecided → Medium
status: New → Triaged
Changed in cinder (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page) on 2017-02-23
Changed in charm-neutron-gateway:
importance: Undecided → Medium
status: New → Triaged
Changed in neutron-gateway (Juju Charms Collection):
status: Triaged → Invalid
Chris Holcombe (xfactor973) wrote :

I just ran into this bug as well. The function is ipv4 only which is unfortunate. If DNS fails to resolve it falls back on this: https://docs.python.org/3/library/socket.html#socket.gethostbyname which is v4 only.

Marking the charm-ceph task wontfix as the ceph charm has been removed from support for a while now

Changed in charm-ceph:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers