juju lxc instances deployed via MAAS don't have resolvable hostnames

Bug #1274947 reported by James Page
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Wishlist
Unassigned
maas (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

When using juju managed lxc instances with the MAAS provider, the physical hosts end up with dns resolvable hostnames, but the lxc containers do not.

It would be nice if MAAS could do more generic DNS management and just add DNS entries for all DHCP clients that have presented a hostname.

Some services (such as rabbitmq when natively clustered) depend on having resolvable hostnames.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: maas 1.4+bzr1820+dfsg-0ubuntu1
ProcVersionSignature: Ubuntu 3.13.0-5.20-generic 3.13.0
Uname: Linux 3.13.0-5-generic x86_64
ApportVersion: 2.13.1-0ubuntu2
Architecture: amd64
Date: Fri Jan 31 13:58:05 2014
InstallationDate: Installed on 2014-01-23 (8 days ago)
InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Release amd64 (20131016)
PackageArchitecture: all
SourcePackage: maas
UpgradeStatus: Upgraded to trusty on 2014-01-23 (8 days ago)

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Wait - so you want MAAS to add DNS entries for hosts it's not managing?

Changed in maas:
status: New → Incomplete
Revision history for this message
Julian Edwards (julian-edwards) wrote :

What hostname does juju give the LXCs? FWIW all of the DHCP addresses that MAAS assigns do have a default DNS entry of N-N-N-N.domain.

Revision history for this message
James Page (james-page) wrote :

Juju uses the machine name as the hostname

so you get something like

  juju-precise-mysql-0.<domain>

Revision history for this message
James Page (james-page) wrote :

One use case for this is rabbitmq-server; to make active/active rabbitmq clusters, rabbit nodes need to be joined just using the hostname; if this does not resolve, clustering fails.

I think this is actually an erlang issue (which behaves oddly when DNS is not configured properly).

Changed in maas:
status: Incomplete → New
Revision history for this message
Julian Edwards (julian-edwards) wrote :

OK I guess we need to add some API call for CNAME inclusion. Still, seems a bit weird that MAAS itself should be doing it. Hmmm, maybe we need a brainstorming session.

Changed in maas:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu):
status: New → Confirmed
James Troup (elmo)
tags: added: canonical-is
Revision history for this message
Julian Edwards (julian-edwards) wrote :

This is a related problem https://bugs.launchpad.net/bugs/1250435

Changed in maas (Ubuntu):
importance: Undecided → Wishlist
status: Confirmed → Triaged
Revision history for this message
Alexander List (alexlist) wrote :

We also see this behaviour on LXC containers deployed to MAASified machines where MAAS does *not* manage DHCP/DNS.

We are trying to smoosh OpenStack infra into LXC units on MAASified machines for HA.

It is most visible with rabbitmq-server, which bails out on NXdomain for the unit.

It's also reproducible by just trying to sudo:

ubuntu@juju-machine-5-lxc-25:~$ sudo -i
sudo: unable to resolve host juju-machine-5-lxc-25

What *could* help is to pre-allocate MACs for each LXC container, either inside or outside MAAS doesn't really matter, and inject them into the host machine via lxc.network.hwaddr, cf. http://manpages.ubuntu.com/manpages/trusty/man5/lxc.container.conf.5.html

This would require that cloud-init/juju respect what they get from DHCP/DNS in terms of hostname...

tags: added: dns
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

This issue makes several HA services to fail (or split-brain):
* rabbitmq-server:
Followup from above, this MP[0] forces rabbit nodename to be the resolvable hostname for private-address[0], and fixes it clustering.
* mongodb:
No way: mongod uses gethostname() at rs.initiate() to initialize cluster (--replSet ...), then it fails because it can't connect to itself via $HOSTNAME, I'm using this (brute-force) script[1] to work-around it.

[0] https://code.launchpad.net/~jjo/charms/trusty/rabbitmq-server/fix-nodename-to-host-dns-PTR
[1] https://gist.github.com/jjo/198e27c8e44f68724fcd

JuanJo Ciarlante (jjo)
tags: added: canonical-bootstack
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FYI nova services output also refers to hostnames (which are then
unresolvable):
Here below, nova-compute services are deployed to the metal-s, others
to LXCs on them:

$ nova service-list
+----------------+----------------------+----------+---------+-------+-...
| Binary | Host | Zone | Status | State | ...
+----------------+----------------------+----------+---------+-------+-...
| nova-conductor | juju-machine-1-lxc-5 | internal | enabled | up | ...
| nova-cert | juju-machine-1-lxc-5 | internal | enabled | up | ...
| nova-compute | homer | nova | enabled | up | ...
| nova-scheduler | juju-machine-1-lxc-5 | internal | enabled | up | ...
| nova-compute | bart | nova | enabled | up | ...
| nova-compute | lisa | nova | enabled | up | ...
| nova-compute | marge | nova | enabled | up | ...
| nova-conductor | juju-machine-0-lxc-9 | internal | enabled | up | ...
| nova-cert | juju-machine-0-lxc-9 | internal | enabled | up | ...
| nova-scheduler | juju-machine-0-lxc-9 | internal | enabled | up | ...
| nova-conductor | juju-machine-2-lxc-6 | internal | enabled | up | ...
| nova-cert | juju-machine-2-lxc-6 | internal | enabled | up | ...
| nova-scheduler | juju-machine-2-lxc-6 | internal | enabled | up | ...
+----------------+----------------------+----------+---------+-------+-...

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FYI this is voiding current trusty/rabbitmq charm from deploying on MaaS
1.7beta + LXC, 1.5 had at least PTR resolution for every dhcp'd IP as e.g:
IN PTR 10-1-57-22.maas. , while 1.7beta has none afaicT.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

To clarify: comment #12 was for clustered rabbitmq (as already reported
above), FYI trying to force the charm to use plain IPs instead of hostnames
(recall that other units need to be able to refer to each other as
e.g.. rabbit@$OTHER_HOST) fails with e.g.:

2014-10-16 18:51:13 INFO config-changed nodes in question: ['rabbit@10.1.57.62']
2014-10-16 18:51:13 INFO config-changed =ERROR REPORT==== 16-Oct-2014::18:51:13 ===
2014-10-16 18:51:13 INFO config-changed ** System NOT running to use fully qualified hostnames **
2014-10-16 18:51:13 INFO config-changed ** Hostname 10.1.57.62 is illegal **

Christian Reis (kiko)
Changed in maas:
milestone: none → next
Revision history for this message
Manuel Stein (manuel.stein) wrote :

IMHO, the current MAAS DNS management is bad. It uses static zones and when you try modifying the templates, the whole zone-writing/reloading mechanism blows up, celery tasks for freeze/thaw won't do. Dynamic zones would at least allow DHCP to update DNS via rndc. MAAS DNS config should use the same mechanism, so you get to choose your DNS service independent from MAAS. Moreover, if DHCP leases were also managed using omshell, that would give ultimate freedom to where network services are managed.
(I'll try unmanaged next to set up good old dynamic DHCP and DNS - would that work with both MAAS and the juju lxcs?)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.