1.25.2 doesn't set up DNS information with MAAS

Bug #1528217 reported by Cheryl Jennings
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Andrew McDermott
juju-core
Invalid
Critical
Unassigned
1.25
Invalid
Critical
Unassigned

Bug Description

The maas-1_9-OS-deployer test fails for 1.25.2 waiting for units to complete set up. A look in the unit logs showed this error:

2015-12-18 20:45:27 INFO juju.api apiclient.go:270 error dialing "wss://juju-qa-maas-node-30.maas:17070/environment/cf1dc32e-c10d-43fa-8b86-c483c7937dc4/api": websocket.Dial wss://juju-qa-maas-node-30.maas:17070/environment/cf1dc32e-c10d-43fa-8b86-c483c7937dc4/api: dial tcp: lookup juju-qa-maas-node-30.maas: no such host

Containers that are started are unable to resolve any hostnames.

See later comments in bug #1525280 for more information.

See also MAAS bug #1528532 for the fact that MAAS controlled subnets return [] for dns_servers.

Changed in juju-core:
status: New → Confirmed
importance: Undecided → Critical
milestone: none → 1.25.2
tags: added: blocker maas network
tags: added: ci
Martin Packman (gz)
Changed in juju-core:
milestone: 1.25.2 → 2.0-alpha1
status: Confirmed → Triaged
tags: added: regression
Changed in juju-core:
status: Triaged → Incomplete
Revision history for this message
Michael Foord (mfoord) wrote :

When address-allocation is off, 1.25.1 uses dhcp for containers. The code in apiserver/provisioner/provisioner.go has changed a great deal in 1.25.2 (specifically prepareOrGetContainerInterfaces) and now renders a manual interface.

Example of /e/n/i for a container with 1.25.1:

# loopback interface
auto lo
iface lo inet loopback
# interface "eth0"
auto eth0
iface eth0 inet dhcp

With 1.25.2:

# loopback interface
auto lo
iface lo inet loopback

# interface "eth0"
auto eth0
iface eth0 inet manual
    pre-up ip address add 172.16.0.5/32 dev eth0 &> /dev/null || true
    up ip route replace 172.16.0.1 dev eth0
    up ip route replace default via 172.16.0.1
    down ip route del default via 172.16.0.1 &> /dev/null || true
    down ip route del 172.16.0.1 dev eth0 &> /dev/null || true
    post-down ip address del 172.16.0.5/32 dev eth0 &> /dev/null || true

Our code for getting nameservers and DNS search from resolv.conf is only triggered when address allocation is on. This change probably needs reverting (why was it made?) or we need to use resolv.conf unconditionally for containers, not just when address allocation is on.

Revision history for this message
Andrew McDermott (frobware) wrote :

If we choose to keep the change then maybe we might be able to restore functionality by ensuring localDNSServers() is called in lxc-broker.go.

Revision history for this message
Andrew McDermott (frobware) wrote :

I was looking at augmenting DNS info ala:

$ more x
diff --git a/worker/provisioner/lxc-broker.go b/worker/provisioner/lxc-broker.go
index 3279f23..4a01714 100644
--- a/worker/provisioner/lxc-broker.go
+++ b/worker/provisioner/lxc-broker.go
@@ -684,6 +684,17 @@ func prepareOrGetContainerInterfaceInfo(
                return nil, errors.Trace(err)
        }

+ dnsServers, searchDomain, dnsErr := localDNSServers()
+
+ if dnsErr != nil {
+ return nil, errors.Trace(dnsErr)
+ }
+
+ for i, _ := range preparedInfo {
+ preparedInfo[i].DNSServers = dnsServers
+ preparedInfo[i].DNSSearch = searchDomain
+ }
+
        log.Tracef("PrepareContainerInterfaceInfo returned %#v", preparedInfo)
        // Most likely there will be only one item in the list, but check
        // all of them for forward compatibility.

Revision history for this message
Andrew McDermott (frobware) wrote :

Which seems to work given the 1 time I have tried it:

$ juju ssh 0/lxc/0
Warning: Permanently added 'dental-table.maas19,10.17.20.4' (ECDSA) to the list of known hosts.
Warning: Permanently added '10.17.20.212' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-74-generic x86_64)

 * Documentation: https://help.ubuntu.com/

  System information as of Mon Dec 21 18:00:35 UTC 2015

  System load: 0.99 Processes: 12
  Usage of /: 19.3% of 39.24GB Users logged in: 0
  Memory usage: 27% IP address for eth0: 10.17.20.212
  Swap usage: 0%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

ubuntu@juju-machine-0-lxc-0:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.17.20.200
search maas19

Revision history for this message
Michael Foord (mfoord) wrote :

I can confirm that the same change fixes the problem for me.

Revision history for this message
Andrew McDermott (frobware) wrote :

WIP branch: https://github.com/frobware/juju/tree/1.25-lp1528217

Waiting for unit tests to run to see what the breakage is.

Revision history for this message
Andrew McDermott (frobware) wrote :

http://reviews.vapour.ws/r/3436/

Need to decide whether this should be applied to 1.25.

Revision history for this message
John George (jog) wrote :

Observed one unit test failure here:
http://reports.vapour.ws/releases/3457/job/run-unit-tests-trusty-amd64/attempt/3433#highlight

This unit test passed on a re-try, although it's interesting that it appears lease related.

Overall CI test run results can be seen here:
http://reports.vapour.ws/releases/3457

The tests that cured build revision 3457 are not due to the issue for which this bug was reported:
1. Openstack deployment on MAAS 1.8 (maas-1_8-OS-deployer). This is the 'Unable to allocate static IP due to address exhaustion.' error that CI has been experiencing. It's cause is not yet understood.

2. The maas-1_8-upgrade-win2012hvr2-amd64 test failure is not a Juju issue. It's related to an empty list coming back from the maas cli. The log output from this windows upgrade test shows the test completed as expected.

John A Meinel (jameinel)
description: updated
Revision history for this message
John A Meinel (jameinel) wrote :

I'm unable to reproduce that test failure running the test locally. It looks to be an infrastructure issue as something is closing a connection prematurely.

We do have a test that is checking something similar, namely TestStartInstancePopulatesNetworkInfo, if you remove the line about "s.SetFeatureFlags(feature.AddressAllocation)" the only thing that fails is that NetworkName isn't being populated by "private". Which I'm guessing is being done in provisioner_task.go:674 when the feature flag is enabled.

My only other concern is that we are unilaterally setting the DNSServers rather than merging the local ones if any were also returned. However, that is also what we would do with AddressAllocation set, which is intended to become the new way of doing things.

So it seems ok to go with Andy's patch.

I did put up a CI test infrastructure fix for the empty list bug. I'm guessing there is a bad node in MAAS that is confusing the test suite, but hopefully we can at least get a better error if something is wrong.

Revision history for this message
John George (jog) wrote :

Bug 1528975 was opened for the unit test failure mentioned in comment #8. It's indeed intermittent and has been seen on several other branches, so not unique to the proposed fix for this bug.

tags: added: kanban-cross-team
tags: removed: kanban-cross-team
Revision history for this message
Andrew McDermott (frobware) wrote :

Updated PR with unit test: http://reviews.vapour.ws/r/3451/

Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
Revision history for this message
Andrew McDermott (frobware) wrote :

On MAAS 1.8.2 I get:

# loopback interface
auto lo
iface lo inet loopback

# interface "eth0"
auto eth0
iface eth0 inet manual
    dns-nameservers 10.17.17.200
    dns-search maas
    pre-up ip address add 10.17.17.205/32 dev eth0 &> /dev/null || true
    up ip route replace 10.17.17.1 dev eth0
    up ip route replace default via 10.17.17.1
    down ip route del default via 10.17.17.1 &> /dev/null || true
    down ip route del 10.17.17.1 dev eth0 &> /dev/null || true
    post-down ip address del 10.17.17.205/32 dev eth0 &> /dev/null || true

--------------------------------

On MAAS 1.8.3 I get:

# loopback interface
auto lo
iface lo inet loopback

# interface "eth0"
auto eth0
iface eth0 inet manual
    dns-nameservers 10.17.18.200
    dns-search maas183
    pre-up ip address add 10.17.18.211/32 dev eth0 &> /dev/null || true
    up ip route replace dev eth0
    up ip route replace default via
    down ip route del default via &> /dev/null || true
    down ip route del dev eth0 &> /dev/null || true
    post-down ip address del 10.17.18.211/32 dev eth0 &> /dev/null || true

Note the lack of IP address in the route commands.

Revision history for this message
Andrew McDermott (frobware) wrote :
Revision history for this message
Andrew McDermott (frobware) wrote :
Revision history for this message
Andrew McDermott (frobware) wrote :

On my 1.8.3 installation (from scratch) the discovered interface 'maas-eth0' has no default gateway. Using the MAAS UI you can visually see this by looking at "Networks". I added a default gateway for the "maas-eth0" interface and I now get the same behaviour as 1.8.2 when creating containers. The lack of a default gateway (as returned by the MAAS API call) is the reason all the IP addresses in the route command are missing in comment #12.

What's not clear is why there is no default gateway. My 1.8.2 install has been around for a while and I may have added this manually. I will do a clean 1.8.2 install and look to see if the discovered interface gets a default route.

Revision history for this message
Andrew McDermott (frobware) wrote :

I did clean installs of 1.8.2 and 1.8.3 - in both cases the discovered interface "maas-eth0" does NOT have a default gateway. As we discovered this on 1.8.3 (and with a clean install) I'm guessing somewhere along the line I've added a default gateway for my interface. I did a clean install of 1.9.0 and I DO get a default gateway.

In the same way that we now add DNS info (comment #11) we could also do the same for the default gateway if it is empty. This would make the change transparent to existing 1.8 juju/maas users.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

This is no longer a bug once the fix for bug 1483879 is backed out. However, it should be fixed once we re-introduce those changes.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

This will show up in master once the changes for bug 1525280 land there.

Revision history for this message
Andrew McDermott (frobware) wrote :
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote : Fix Released in juju-core 1.25

Juju-CI verified that this issue is Fix Released in juju-core 1.25:
    http://reports.vapour.ws/releases/3498

Revision history for this message
Andrew McDermott (frobware) wrote :

For 1.25.2 I wonder if this should be marked as fixed-released unless we think of the fix as "we reverted the changes".

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Yeah, we want to track this for when we re-introduce the fix for bug #1483879. I'm going to move this to invalid and target it to 1.25.3

Changed in juju-core:
assignee: nobody → Andrew McDermott (frobware)
Changed in juju-core:
status: Incomplete → In Progress
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-alpha1 → none
milestone: none → 2.0-alpha1
Changed in juju-core:
importance: Undecided → Critical
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.