bionic LXD containers on bionic hosts get incorrect /etc/resolve.conf files

Bug #1764317 reported by John A Meinel on 2018-04-16
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
juju
High
Eric Claude Jones
2.3
High
Eric Claude Jones

Bug Description

I just tried:
  juju bootstrap --bootstrap-series=bionic maas
  juju deploy cs:~jameinel/ubuntu-lite --series=bionic --to lxd:0 -m controller

After doing so, the container fails to start up because it "cannot resolve archive.ubuntu.com" (note I think we've seen this in CI runs as well).

However, the reason for that is because the host machine has this /etc/resolve.conf:

 nameserver 127.0.0.53
 search maas

Presumably this means we're running a local DNS proxy that is actually itself configured to talk to MAAS for any more information.

However, when launching a container, we read the host machines DNS information if we don't have any other information. But it is very clear that 127.0.0.53 is not going to be available from inside the container.

It appears that 127.0.0.53 is being created from systemd-resolved. (just looking around at other bugs like: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320)

I don't see the other config files for "resolved.conf" namely
/etc/systemd/resolved.conf (exists but doesn't contain anything interesting)
/etc/systemd/resolved.conf.d (doesn't exist)
/run/systemd/resolved.conf.d (doesn't exist)
/usr/lib/systemd/resolved.conf.d (doesn't exist)

I did find on the host system:
/run/systemd/resolve/resolv.conf which contains only:
 nameserver 10.0.0.1
 search maas

I injected that inside the container, and then ran
 systemctl restart systemd-resolved

And then I was able to get:
# systemctl status systemd-resolved
...
Apr 16 07:53:20 juju-930c9c-0-lxd-0 systemd-resolved[687]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-add
Apr 16 07:53:20 juju-930c9c-0-lxd-0 systemd-resolved[687]: Using system hostname 'juju-930c9c-0-lxd-0'.
Apr 16 07:53:20 juju-930c9c-0-lxd-0 systemd[1]: Started Network Name Resolution.

However,
root@juju-930c9c-0-lxd-0:~# host archive.ubuntu.com
;; connection timed out; no servers could be reached

While this does work:
# host archive.ubuntu.com 10.0.0.1
Using domain server:
Name: 10.0.0.1
Address: 10.0.0.1#53
Aliases:

archive.ubuntu.com has address 91.189.88.152
archive.ubuntu.com has address 91.189.88.149
archive.ubuntu.com has address 91.189.88.161
archive.ubuntu.com has address 91.189.88.162
archive.ubuntu.com has IPv6 address 2001:67c:1560:8001::14
archive.ubuntu.com has IPv6 address 2001:67c:1360:8001::17
archive.ubuntu.com has IPv6 address 2001:67c:1360:8001::21
archive.ubuntu.com has IPv6 address 2001:67c:1560:8001::11

From what I can tell, systemd-resolved might read /etc/resolv.conf on startup if it is not a symlink to /run/systemd/resolve/stub-resolve.conf and then configure itself appropriately, before replacing it with the symlink.

Running on the host machine I see:
c# systemd-resolve --status
Global
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 6 (vethXG5VHE)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 4 (br-enp0s25)
      Current Scopes: DNS
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 10.0.0.1
          DNS Domain: maas

Link 3 (lxdbr0)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 2 (enp0s25)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Those seem to be set up by juju in the 'netplan' configuration at:
/etc/netplan/99-juju.yaml:
network:
  version: 2
  ethernets:
    enp0s25:
      match:
        macaddress: b8:ae:ed:79:c7:92
      set-name: enp0s25
      mtu: 1500
  bridges:
    br-enp0s25:
      interfaces: [enp0s25]
      addresses:
      - 10.0.0.156/24
      gateway4: 10.0.0.1
      nameservers:
        search: [maas]
        addresses: [10.0.0.1]
      mtu: 1500

However, inside the container we end up with:
network:
  version: 2
  ethernets:
    eth0:
      match:
        macaddress: 00:16:3e:4a:39:18
      addresses:
      - 10.0.0.26/24
      gateway4: 10.0.0.1
      nameservers:
        search: [maas]
        addresses: [127.0.0.53]

Editing /etc/netplan/99-juju.yaml inside the container and then running
netplan generate
netplan apply

Does show the container getting the right dns server to forward to.

So we need to figure out how we are getting the right DNS server for the host machine, so that we can put it into the host's 99-juju.yaml, but we are overriding that value with the host's /etc/resolve.conf which is no longer the right value.

I'm guessing we'll also be doing the wrong thing for a KVM container.

Ryan Beisner (1chb1n) on 2018-04-19
tags: added: uosci
Ryan Beisner (1chb1n) wrote :

To conirm: this is impacting Bionic with OpenStack Queens. Some workloads are sensitive to dns failures, as they expect basic a/ptr record resolution sanity (nova-cloud-controller, rabbitmq, and I think even non-OpenStack workloads such as hadoop and k8s).

Frode Nordahl (fnordahl) wrote :

Here is a way to "work around" the issue in case you need to test Juju deployed containers on bionic: https://pastebin.ubuntu.com/p/c86jGhtwGp/

You would of course need to replace search domain and resolver address according to your environment.

Changed in juju:
assignee: nobody → Eric Claude Jones (ecjones)
Jason Hobbs (jason-hobbs) wrote :

This impacts basically anything that deploys inside of bionic containers on bionic, because we can't resolve package archive hostnames, which almost everything uses.

tags: added: cdo-qa cdo-qa-blocker foundations-engine
Chris Gregan (cgregan) wrote :

As this is not isolated to ARM, I have escalated to Field High

Eric Claude Jones (ecjones) wrote :
Changed in juju:
status: Triaged → Fix Committed
KingJ (kj-kingj) wrote :

Although a fix has now been merged in to the development branch, is there any way to apply the workaround on the current stable release other than SSHing in to every container and modifying/applying the netplan configuration? I'm currently using Juju with MAAS to roll out an OpenStack cluster and with the number of services that use LXD there's quite a few things to change.

Right now the least resource intensive way i've found to apply the change is to run this for each app;

juju run "sed -i 's/127.0.0.53/10.1.10.1/' /etc/netplan/99-juju.yaml; netplan apply" --application=openstack-dashboard

There are several bionic related issues that should be fixed and released
to 2.3.8

John
=:->

On Mon, May 7, 2018, 18:01 KingJ <email address hidden> wrote:

> Although a fix has now been merged in to the development branch, is
> there any way to apply the workaround on the current stable release
> other than SSHing in to every container and modifying/applying the
> netplan configuration? I'm currently using Juju with MAAS to roll out an
> OpenStack cluster and with the number of services that use LXD there's
> quite a few things to change.
>
> Right now the least resource intensive way i've found to apply the
> change is to run this for each app;
>
> juju run "sed -i 's/127.0.0.53/10.1.10.1/' /etc/netplan/99-juju.yaml;
> netplan apply" --application=openstack-dashboard
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1764317
>
> Title:
> bionic LXD containers on bionic hosts get incorrect /etc/resolve.conf
> files
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1764317/+subscriptions
>

John A Meinel (jameinel) on 2018-05-08
Changed in juju:
milestone: none → 2.4-beta2
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers