LXC containers under MAAS get no "search <domain>" entry in resolv.conf when deployed with juju2

Bug #1575940 reported by Andreas Hasenack
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Dimiter Naydenov

Bug Description

Used juju 2 beta6, and MAAS 1.9.1

When using juju2 to deploy a unit to an LXC (didn't test LXD) container on a MAAS 1.9.1 node, that container gets an incomplete /etc/resolv.conf file. It has no "search <domain>" line:

root@juju-machine-0-lxc-1:~# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.96.0.10
root@juju-machine-0-lxc-1:~#

A PTR query works:
root@juju-machine-0-lxc-1:~# ip -4 -o addr show eth0
12: eth0 inet 10.96.13.36/17 brd 10.96.127.255 scope global eth0\ valid_lft forever preferred_lft forever

root@juju-machine-0-lxc-1:~# host 10.96.13.36
36.13.96.10.in-addr.arpa domain name pointer murky-arithmetic.scapestack.
root@juju-machine-0-lxc-1:~#

The corresponding fqdn query also works:
root@juju-machine-0-lxc-1:~# host murky-arithmetic.scapestack
murky-arithmetic.scapestack has address 10.96.13.36

A non-fqdn query does not work, though:
root@juju-machine-0-lxc-1:~# host murky-arithmetic
Host murky-arithmetic not found: 3(NXDOMAIN)

This is currently breaking the deployment of the rabbitmq-server charm in a container using juju2 (https://bugs.launchpad.net/charms/+source/rabbitmq-server/+bug/1575349)

With juju1, resolv.conf is complete:
# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.96.0.10
search scapestack

And this works:

a) non-fqdn query:
root@juju-machine-0-lxc-7:~# host 10-96-9-136
10-96-9-136.scapestack has address 10.96.9.136

b) PTR query:
root@juju-machine-0-lxc-7:~# host 10.96.9.136
136.9.96.10.in-addr.arpa domain name pointer 10-96-9-136.scapestack.

summary: LXC containers under MAAS get no "search <domain>" entry in resolv.conf
+ when deployed with juju2
description: updated
tags: added: kanban-cross-team landscape
tags: removed: kanban-cross-team
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Could you post /var/log/cloud-init* please? The Juju logs may also be useful so you might as well upload those too.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I can't recreate on beta6 or master tip. The /var/log/cloud-init-output.log would be helpful from your environment.

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Interesting. It happens in one of our MAAS servers, but not the other. Both are at the same version: 1.9.1+bzr4543-0ubuntu2 (trusty1)

This is cloud-init-output from the broken case, where /etc/resolv.conf lacks "search <domain>". It's a xenial ubuntu container on the controller node of the admin model.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And now /var/log/juju

Changed in juju-core:
status: Incomplete → New
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The template container has the line:

root@juju-xenial-lxc-template:~# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.96.0.10
search scapestack
root@juju-xenial-lxc-template:~#

The unit-container doesn't:

root@amco:~# lxc-attach -n juju-machine-0-lxc-0
root@juju-machine-0-lxc-0:~# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.96.0.10
root@juju-machine-0-lxc-0:~#

most strange :/

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This is from machine0, where the failing container was deployed to.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-beta7
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta7 → 2.0-beta8
Changed in juju-core:
milestone: 2.0-beta8 → 2.0-beta9
Christian Reis (kiko)
tags: added: cdo-qa-blocker
Revision history for this message
Cheryl Jennings (cherylj) wrote :

@cgregan, @ahasenack - can you confirm that the problem also exists with lxd containers?

Changed in juju-core:
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I cannot reproduce this using MAAS 1.9.3+bzr4577-0ubuntu1 (trusty1) and Juju 2.0-beta9-xenial-amd64 and LXD containers. I've deployed 5 nodes, each with 3 LXD containers. I double checked /etc/resolv.conf inside the containers, and it was as expected - containing both nameserver and search, like on the host. Pinging any node's FQDN or hostname worked the same inside containers as on their hosts.

Changed in juju-core:
status: Triaged → Incomplete
assignee: Dimiter Naydenov (dimitern) → nobody
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Did you look at the attached logs?

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Yes I did look at the logs, there's nothing useful in there.

I've also tried using Juju 2.0-beta6-xenial-amd64 on the same MAAS 1.9.3, deploying ubuntu to a couple of LXC and LXD containers. All of them came up as expected, including the correct /etc/resolv.conf like on the host. Pinging another running maas node by hostname or FQDN work exactly the same.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Is there some extra debugging we could add in to a binary to have Andreas or Chris test with? Andreas has a maas cluster that reproduces this problem reliably.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

If it's easy to reproduce, I'd suggest bootstrapping with --config logging-config='<root>=TRACE' and beta9 from source (if possible, otherwise the released beta8), switching to the controller model, and adding a couple of LXD and LXC (add-machine lxc:0 and lxd:0), reproducing the issue, then attaching the logs from machine 0: /var/log/juju/machine-0.log, /etc/resolv.conf, and /var/log/cloud-init-output.log.

Revision history for this message
Thiago (thisab) wrote :

I think that I cloud could have the same problem, after I opened: https://bugs.launchpad.net/landscape/+bug/1591375
I looked at syslog in landscape container and found this:

Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/smtp[11872]: 915161E176B: to=<email address hidden>, relay=none, delay=0.03, delays=0.02/0/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=juju-machine-0-lxc-1.maas type=AAAA: Host not found)
Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/bounce[11876]: 846BD1E1768: sender non-delivery notification: 970391E1766
Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/qmgr[8918]: 970391E1766: from=<>, size=2639, nrcpt=1 (queue active)
Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/qmgr[8918]: 846BD1E1768: removed
Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/qmgr[8918]: 915161E176B: removed
Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/smtp[11868]: 970391E1766: to=<email address hidden>, relay=none, delay=0.04, delays=0.04/0/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=juju-machine-0-lxc-1.maas type=AAAA: Host not found)
Jun 10 21:11:35 juju-machine-0-lxc-1 postfix/qmgr[8918]: 970391E1766: removed
Jun 10 21:13:19 juju-machine-0-lxc-1 postfix/pickup[8917]: EFDE91E1766: uid=1000 from=<ubuntu>
Jun 10 21:13:19 juju-machine-0-lxc-1 postfix/cleanup[12027]: EFDE91E1766: message-id=<email address hidden>
Jun 10 21:13:20 juju-machine-0-lxc-1 postfix/qmgr[8918]: EFDE91E1766: from=<email address hidden>, size=508, nrcpt=1 (queue active)
Jun 10 21:13:20 juju-machine-0-lxc-1 postfix/smtp[12028]: EFDE91E1766: to=<email address hidden>, orig_to=<root>, relay=none, delay=0.05, delays=0.04/0.01/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=juju-machine-0-lxc-1.maas type=AAAA: Host not found)

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta9 → 2.0-beta10
Changed in juju-core:
status: Incomplete → Triaged
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, I still see this with beta9 and using lxd now:
$ juju-2.0 status
MODEL CONTROLLER CLOUD VERSION
default scapecontroller scapestack 2.0-beta9

APP STATUS EXPOSED ORIGIN CHARM REV OS
ubuntu-container unknown false jujucharms ubuntu 0 ubuntu
ubuntu-host unknown false jujucharms ubuntu 0 ubuntu

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE
ubuntu-container/0 unknown idle 0/lxd/0 10.96.13.54
ubuntu-host/0 unknown idle 0 10.96.13.53

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.96.13.53 /MAAS/api/1.0/nodes/node-2fdcca2c-39eb-11e5-ab72-2c59e54ace74/ xenial dawn
  0/lxd/0 started 10.96.13.54 juju-5fdd60-0-lxd-0 xenial

andreas@nsn7:~$ juju-2.0 ssh 0/lxd/0
...
ubuntu@juju-5fdd60-0-lxd-0:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.96.0.10
ubuntu@juju-5fdd60-0-lxd-0:~$

Will work on getting the info Dimiter requested.

Revision history for this message
Chris Gregan (cgregan) wrote :

unable to reproduce with MAAS 1.9.3, juju 2.0 Beta9 and LXD

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, I bootstrapped with the requested --config setting, and then did:

juju deploy ubuntu ubuntu-lxd --to lxd:0 <-- gave me 0/lxd/0
juju add-machine lxd:0 <-- gave me 0/lxd/1

MODEL CONTROLLER CLOUD VERSION
controller scapecontroller scapestack 2.0-beta9

APP STATUS EXPOSED ORIGIN CHARM REV OS
ubuntu-lxd unknown false jujucharms ubuntu 0 ubuntu

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE
ubuntu-lxd/0 unknown idle 0/lxd/0 10.96.13.53

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.96.13.49 /MAAS/api/1.0/nodes/node-32fdcb12-546c-11e4-b3f2-2c59e54ace74/ xenial dawn
  0/lxd/0 started 10.96.13.53 juju-6a81a0-0-lxd-0 xenial
  0/lxd/1 started 10.96.13.54 juju-6a81a0-0-lxd-1 xenial

Now /etc/resolv.conf:
$ juju-2.0 run --all 'sudo cat /etc/resolv.conf'
- MachineId: "0"
  Stdout: |
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 10.96.0.10
    search scapestack
- MachineId: 0/lxd/0
  Stderr: |
    sudo: unable to resolve host juju-6a81a0-0-lxd-0
  Stdout: |
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 10.96.0.10
- MachineId: 0/lxd/1
  Stderr: |
    sudo: unable to resolve host juju-6a81a0-0-lxd-1
  Stdout: |
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 10.96.0.10

Only machine 0 has the "search <domain>" line.

Didn't try with lxc because beta9 now only supports lxd.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

from 0/lxd/0 itself

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

from 0/lxd/1 itself

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I've reproduced the issue and I'm working on a fix now.

It happens when deploying a node on a subnet which has DNS servers defined. Having only DNS servers (there's no way to get the search domains from MAAS 1.9 API) means we won't try to parse /etc/resolv.conf on the host to fill in the blanks while generating the container's /etc/network/interfaces.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Fix for 2.0 proposed: https://github.com/juju/juju/pull/5713

Working on the backport of the above for 1.25.

Changed in juju-core:
status: In Progress → Won't Fix
status: Won't Fix → In Progress
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Actually it seems 1.25 is not affected the same way, as we always parse the host's /etc/resolv.conf in order to complete the in-container /etc/network/interfaces generation. I couldn't reproduce the same steps.

no longer affects: juju-core/1.25
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta10 → 2.0-beta11
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

To clarify, this fix landed ~20m too late to be in the 2.0-beta10 released today, but it will be in the next one, scheduled for next week.

Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta11 → none
milestone: none → 2.0-beta11
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.