Bug #1525280 “1.25.1 with maas 1.8: devices dns allocation uses ...” : Bugs : Canonical Juju

Andreas Hasenack (ahasenack) on 2015-12-11

tags:

added: kanban-cross-team

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-12-11:

#1

When using juju 1.25.2 (a build I got from juju's CI), the error is shown in juju debug-log:

machine-0: 2015-12-11 15:19:07 WARNING juju.apiserver.client status.go:677 error fetching public address: public no address
machine-0: 2015-12-11 15:19:13 WARNING juju.provisioner lxc-broker.go:113 failed to prepare container "0/lxc/0" network config: failed to allocate an address for "0/lxc/0": failed creating MAAS device for container "juju-machine-0-lxc-0" with MAC address "00:16:3e:ee:e3:aa": gomaasapi: got error back from server: 400 BAD REQUEST ({"hostname": ["Node with this Hostname already exists."]})
machine-0: 2015-12-11 15:19:14 WARNING juju.apiserver.client status.go:677 error fetching public address: public no address

Which corresponds to the api call as seen in this apache log:
10.1.102.146 - - [11/Dec/2015:09:19:13 -0600] "POST /MAAS/api/1.0/devices/?op=new HTTP/1.1" 400 297 "-" "Go 1.1 package http"

Maybe juju should have failed the action in this case instead of happily continuing with the deployment?

🤖 Landscape Builder (landscape-builder) on 2015-12-11

tags:

removed: kanban-cross-team

Cheryl Jennings (cherylj) on 2015-12-11

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → Critical

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2015-12-11:

#2

Andreas, thanks for filing this. I've discovered it while live testing on MAAS using concurrently running environments, but didn't get around to file it as a bug.

The trouble is we might need to end up using hostnames for containers, similar to what MAAS gives nodes by default (<word>-<word>.<domain>), which will hide the purpose of the device (being a container). I'll post updates as we have a proposed fix.

Changed in juju-core:
assignee:	nobody → Dimiter Naydenov (dimitern)
milestone:	none → 2.0-alpha2

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-12-11: Re: [Bug 1525280] Re: 1.25.1 with maas 1.8: devices dns allocation uses non-unique hostname

#3

Andres suggested maybe use the maas host, where the container is, as a
prefix
On Dec 11, 2015 17:55, "Dimiter Naydenov" <email address hidden>
wrote:

> Andreas, thanks for filing this. I've discovered it while live testing
> on MAAS using concurrently running environments, but didn't get around
> to file it as a bug.
>
> The trouble is we might need to end up using hostnames for containers,
> similar to what MAAS gives nodes by default (<word>-<word>.<domain>),
> which will hide the purpose of the device (being a container). I'll post
> updates as we have a proposed fix.
>
> ** Also affects: juju-core/1.25
> Importance: Undecided
> Status: New
>
> ** Changed in: juju-core/1.25
> Status: New => Triaged
>
> ** Changed in: juju-core/1.25
> Importance: Undecided => Critical
>
> ** Changed in: juju-core
> Assignee: (unassigned) => Dimiter Naydenov (dimitern)
>
> ** Changed in: juju-core/1.25
> Assignee: (unassigned) => Dimiter Naydenov (dimitern)
>
> ** Changed in: juju-core/1.25
> Milestone: None => 1.25.2
>
> ** Changed in: juju-core
> Milestone: None => 2.0-alpha2
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1525280
>
> Title:
> 1.25.1 with maas 1.8: devices dns allocation uses non-unique hostname
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1525280/+subscriptions
>

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2015-12-12:

#4

Unfortunately that won't work, as the maas host is the same, regardless of the environment name juju uses or the maas user that allocated the machine. So using e.g. "node-3-juju-machine-1-lxc-1.maas" won't change much. Unlike other providers and resources that we tag using the environment UUID in similar cases (to disambiguate and "namespace" the resources), this is not very useful for container hostnames (who would want to see e.g. "juju-my-maas-env-deadbeef-0bad-f00d-12345678798-machine-1-lxc-1.maas" as a hostname?)

Curtis Hovey (sinzui) on 2015-12-12

tags:

added: maas-provider network regression

Andreas Hasenack (ahasenack) on 2015-12-15

description:

updated

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2015-12-15:

#5

Fix for 1.25 proposed: https://github.com/juju/juju/pull/3966

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2015-12-15:

#6

I demoted this to High on 2.0, as there are more critical stuff around MAAS spaces support we need to finish first.
The fix will be forward ported soon after that.

Changed in juju-core:
importance:	Critical → High
assignee:	Dimiter Naydenov (dimitern) → nobody

Cheryl Jennings (cherylj) on 2015-12-17

Changed in juju-core:
milestone:	2.0-alpha2 → 2.0-alpha1

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-18:

#7

lxc-start output Edit (6.2 KiB, text/plain)

Containers don't bootstrap correctly with this change on 1.25.2

Attaching to the container and looking at /etc/network/interfaces the device name is missing in the 'auto' and 'iface' stanza:

root@dental-table:/var/log/lxc# lxc-attach -n juju-machine-0-lxc-0
root@juju-machine-0-lxc-0:/# cat /etc/network/interfaces

# loopback interface
auto lo
iface lo inet loopback

# interface ""
auto
iface inet manual
    pre-up ip address add 10.17.20.213/32 dev &> /dev/null || true
    up ip route replace dev
    up ip route replace default via
    down ip route del default via &> /dev/null || true
    down ip route del dev &> /dev/null || true
    post-down ip address del 10.17.20.213/32 dev &> /dev/null || true

And if you start the container manually you see some errors from parsing /etc/network/interfaces:

cloud-init-nonet[1329.71]: waiting 120 seconds for network device
/etc/network/interfaces:8: too few parameters for iface line
/sbin/ifdown: couldn't read interfaces file "/etc/network/interfaces"

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-18:

#8

Since my comment in #7 I haven't been able to reproduce this - not once in 4 hours.

I have been building from source:

$ juju version
1.25.2-trusty-amd64

$ git status
On branch 1.25
Your branch is up-to-date with 'upstream/1.25'.

$ git log --format=oneline| head -n 4
57a9a82258551da581059fc0595496ea23c25425 Merge pull request #3284 from bogdanteleaga/gce-win
8785f2eb6c63fbff8c825ba86ca10635acbbd4ba Updated dependencies.tsv
eecc19674947ce60326f3570f11ea438dbdfb43b Enable GCE provider on windows
dd281cdb57220d4be50f258abc5e63943c3ecf21 Merge pull request #3966 from dimitern/lp-1525280-maas-devices-hostnames-1.25

The one time (comment #7) I saw this my steps were:

$ juju bootstrap -e $(juju switch) --upload-tools
$ juju deploy ubuntu --to lxc:0

The container failed to start and I poked around by attaching to it which is where I noticed that there was no "eth0" in /etc/network/interfaces.

I have been testing this against MAAS 1.9rc4.

Revision history for this message

John George (jog) wrote on 2015-12-19:

#9

Example Juju-CI reproduction script. Edit (862 bytes, text/plain)

This script assumes the following repositories are checked out in $HOME:
    bzr branch lp:juju-ci-tools
    bzr branch lp:juju-ci-tools/repository
    bzr branch lp:juju-release-tools

It also requires cloud-city, if you don't know where to get cloud-city, talk to a Juju-CI team member.

Environment variables near the top of the script, such as "environment" will need to be updated to match your setup.

A reproduction deployed with this script has been left running on finfolk.internal.

jenkins@finfolk:~$ juju status -e min-lxc --format tabular
[Services]
NAME STATUS EXPOSED CHARM
python-django unknown false cs:trusty/python-django-12
ubuntu unknown false cs:trusty/ubuntu-5

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
python-django/0 unknown allocating 1/lxc/0 10.0.30.12 Waiting for agent initialization to finish
python-django/1 unknown allocating 1/lxc/1 10.0.30.13 Waiting for agent initialization to finish
python-django/2 unknown allocating 0/lxc/0 10.0.30.14 Waiting for agent initialization to finish
ubuntu/0 unknown idle 1.25.2 1 juju-qa-maas-node-31.maas

[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.2 juju-qa-maas-node-30.maas /MAAS/api/1.0/nodes/node-ce3fd804-71e4-11e5-80fe-525400c43ce5/ trusty arch=amd64 cpu-cores=1 mem=2048M tags=virtual
1 started 1.25.2 juju-qa-maas-node-31.maas /MAAS/api/1.0/nodes/node-cee6f43c-71e8-11e5-aa2a-525400c43ce5/ trusty arch=amd64 cpu-cores=1 mem=2048M tags=virtual,centos,MAAS_NIC_1

This script assumes the following repositories are checked out in $HOME:
    bzr branch lp:juju-ci-tools
    bzr branch lp:juju-ci-tools/repository
    bzr branch lp:juju-release-tools

It also requires cloud-city, if you don't know where to get cloud-city, talk to a Juju-CI team member.

Environment variables near the top of the script, such as "environment" will need to be updated to match your setup.

A reproduction deployed with this script has been left running on finfolk.internal.

jenkins@finfolk:~$ juju status -e min-lxc --format tabular
[Services]    
NAME          STATUS  EXPOSED CHARM                      
python-django unknown false   cs:trusty/python-django-12 
ubuntu        unknown false   cs:trusty/ubuntu-5

[Units]         
ID              WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS            MESSAGE                                    
python-django/0 unknown        allocating          1/lxc/0       10.0.30.12                Waiting for agent initialization to finish 
python-django/1 unknown        allocating          1/lxc/1       10.0.30.13                Waiting for agent initialization to finish 
python-django/2 unknown        allocating          0/lxc/0       10.0.30.14                Waiting for agent initialization to finish 
ubuntu/0        unknown        idle        1.25.2  1             juju-qa-maas-node-31.maas

[Machines] 
ID         STATE   VERSION DNS                       INS-ID                                                         SERIES HARDWARE                                                        
0          started 1.25.2  juju-qa-maas-node-30.maas /MAAS/api/1.0/nodes/node-ce3fd804-71e4-11e5-80fe-525400c43ce5/ trusty arch=amd64 cpu-cores=1 mem=2048M tags=virtual                   
1          started 1.25.2  juju-qa-maas-node-31.maas /MAAS/api/1.0/nodes/node-cee6f43c-71e8-11e5-aa2a-525400c43ce5/ trusty arch=amd64 cpu-cores=1 mem=2048M tags=virtual,centos,MAAS_NIC_1

Revision history for this message

John George (jog) wrote on 2015-12-19:

#10

The bundle yaml used with the reproduction script. Edit (389 bytes, text/plain)

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-19:

#11

I was able to easily reproduce this. I noticed when I first connected to a container which was failing to resolve node0.maas that /etc/resolv.conf was empty:

root@juju-machine-1-lxc-0:/var/lib/juju# sudo cat /etc/resolv.conf
sudo: unable to resolve host juju-machine-1-lxc-0
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

I ran dhclient eth0 and then was able to resolve node0.maas, and saw information in /etc/resolv.conf:

root@juju-machine-1-lxc-0:/var/lib/juju# dhclient eth0
root@juju-machine-1-lxc-0:/var/log# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.100.200
search maas

After this, the unit agent was able to resolve and connect to the state server.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-12-19:

#12

I've done a read through our code. I'm not convinced this isn't a MAAS problem. Based on seeing code for the first time, I think that:

- the /etc/network/interfaces file is generated using data obtained about the MAAS node
- the data obtained from the MAAS node has networking info in it, including the IntefaceName
- because the InterfaceName is "", the interfaces file is broken

So, to get the interface info for a mass node, we make a maas api call.

maasInst := inst.(*maasInstance)
maasObj := maasInst.maasObject
result, err := maasObj.CallGet("details", nil)

The details which come back contains the output of running lshw on the node.
The lssw output contains stuff like this:

  *-network:0
       description: Ethernet interface
       physical id: 1
       logical name: virbr0-nic
       serial: 52:54:00:3f:42:d9
       size: 10Mbit/s
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=no multicast=yes port=twisted pair speed=10Mbit/s

The interface name used by juju comes from the logical name attribute above.

The next step is to look at debug output which hopefully contains some clues as to why Juju is getting a "" interface name. One explanation is that the logical name value sent back from MAAS is missing.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-12-19:

#13

I've connected to the test environment John mentions.
The /etc/network/interface files on the containers are actually correct. They are not malformed like in Andy's example.

So my analysis in the previous comment was based on looking at Andy's output. The QA test environment is broken somehow, but the issue appears different to what Andy was able to reproduce.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-19:

#14

I don't think this is a MAAS issue. I bootstrapped my vMAAS with 1.25.0 and all agents started up normally. However, I ran into the issue where bootstrap node couldn't be resolved on containers when bootstrapping with 1.25.2. The interesting bits I've found so far are:

1 - With 1.25.0, dhclient is started on the containers, and I can resolve node0.maas
2 - With 1.25.2, dhclient was NOT started, and I could not resolve node0.maas. Manually starting dhclient allowed the container to resolve node0.maas and finish agent initialization.
3 - With 1.25.2 I noticed that in the agent.conf file for unit agents, the apiaddresses uses the hostname, rather than the IP:
apiaddresses:
- node0.maas:17070
This is not the case for 1.25.0, and isn't the case for the machine agent.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-12-19:

#15

Looking at the logs (for machine1 and the containers), there's a tonne of rsyslog certificate errors. I think these are orthogonal to the issue at hand, but the logs are full of spam because of it.

2015-12-19 00:29:59 INFO juju.worker runner.go:275 stopped "rsyslog", err: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-12-19 00:29:59 DEBUG juju.worker runner.go:203 "rsyslog" done: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-12-19 00:29:59 ERROR juju.worker runner.go:223 exited "rsyslog": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-12-19 00:29:59 INFO juju.worker runner.go:261 restarting "rsyslog" in 3s

The fact that juju status shows agent state is pending tends to indicate the agent hasn't started and phoned home, yet cloud init output shows everything ok here:

Starting Juju machine agent (jujud-machine-1-lxc-1)
+ cat
+ start jujud-machine-1-lxc-1
jujud-machine-1-lxc-1 start/running, process 841
+ rm /var/lib/juju/tools/1.25.2-trusty-amd64/tools.tar.gz
+ rm /var/lib/juju/tools/1.25.2-trusty-amd64/juju1.25.2-trusty-amd64.sha256
+ ifconfig
eth1 Link encap:Ethernet HWaddr 00:16:3e:f9:04:6a
          inet addr:10.0.30.13 Bcast:255.255.255.255 Mask:255.255.255.255
          inet6 addr: fe80::216:3eff:fef9:46a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:2207 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1099 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:18937819 (18.9 MB) TX bytes:73447 (73.4 KB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:136 errors:0 dropped:0 overruns:0 frame:0
          TX packets:136 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:10528 (10.5 KB) TX bytes:10528 (10.5 KB)

So perhaps there is a connectivity issue between lxc and state server such that the phone home is not received and the rsyslog errors are a symptom of that. Not sure.

Looking at the logs (for machine1 and the containers), there's a tonne of rsyslog certificate errors. I think these are orthogonal to the issue at hand, but the logs are full of spam because of it.

2015-12-19 00:29:59 INFO juju.worker runner.go:275 stopped "rsyslog", err: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-12-19 00:29:59 DEBUG juju.worker runner.go:203 "rsyslog" done: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-12-19 00:29:59 ERROR juju.worker runner.go:223 exited "rsyslog": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-12-19 00:29:59 INFO juju.worker runner.go:261 restarting "rsyslog" in 3s

The fact that juju status shows agent state is pending tends to indicate the agent hasn't started and phoned home, yet cloud init output shows everything ok here:

Starting Juju machine agent (jujud-machine-1-lxc-1)
+ cat
+ start jujud-machine-1-lxc-1
jujud-machine-1-lxc-1 start/running, process 841
+ rm /var/lib/juju/tools/1.25.2-trusty-amd64/tools.tar.gz
+ rm /var/lib/juju/tools/1.25.2-trusty-amd64/juju1.25.2-trusty-amd64.sha256
+ ifconfig
eth1      Link encap:Ethernet  HWaddr 00:16:3e:f9:04:6a  
          inet addr:10.0.30.13  Bcast:255.255.255.255  Mask:255.255.255.255
          inet6 addr: fe80::216:3eff:fef9:46a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2207 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1099 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:18937819 (18.9 MB)  TX bytes:73447 (73.4 KB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:136 errors:0 dropped:0 overruns:0 frame:0
          TX packets:136 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:10528 (10.5 KB)  TX bytes:10528 (10.5 KB)

So perhaps there is a connectivity issue between lxc and state server such that the phone home is not received and the rsyslog errors are a symptom of that. Not sure.

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-19:

#16

Even in the cases where you get a device name in /etc/network/interfaces there's still the problem that /etc/resolv.conf is essentially empty:

ubuntu@weird-bulb:~$ sudo lxc-attach -n juju-machine-0-lxc-3
root@juju-machine-0-lxc-3:~# cat /etc/network/interfaces

# loopback interface
auto lo
iface lo inet loopback

# interface "eth0"
auto eth0
iface eth0 inet manual
    pre-up ip address add 10.17.19.210/32 dev eth0 &> /dev/null || true
    up ip route replace 10.17.19.1 dev eth0
    up ip route replace default via 10.17.19.1
    down ip route del default via 10.17.19.1 &> /dev/null || true
    down ip route del 10.17.19.1 dev eth0 &> /dev/null || true
    post-down ip address del 10.17.19.210/32 dev eth0 &> /dev/null || true

root@juju-machine-0-lxc-3:~# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-19:

#17

There's clearly an intermittent nature to this: for the first time in many iterations of 'machine-add lxc:0' I've just had a case where there is no device name in /etc/network/interfaces. I haven't kept count this morning but I would estimate that I have added a container ~10 times without running into the issue.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-19:

#18

working-cloud-config.txt Edit (6.1 KiB, text/plain)

Here is the cloud-config for a container in a 1.25.0 environment that works. The container can resolve maas hostnames.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-19:

#19

cloud-config.txt Edit (6.7 KiB, text/plain)

Here is the cloud-config for a container in a 1.25.2 environment. The container has connectivity, but cannot resolve maas hostnames.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-19:

#20

I think the root of the problem that's being seen in CI (where containers can't resolve node0.maas), is that we are manually specifying the network config (rather than using dhcp), but we don't manually specify the dns information.

I've attached the two cloud-config.txt files, one from a 1.25.0 env, and one from 1.25.2 env. We specify a manual config for eth0 in 1.25.2, but don't include dns information.

For the 1.25.2 case, I added dns-nameserver and dns-domain into /etc/network/interfaces and did an ifdown eth0 && ifup eth0, and was then able to resolve node0.maas. /etc/resolv.conf also had the correct information. Here's the modified /e/n/i:

root@juju-machine-0-lxc-0:~# cat /etc/network/interfaces

# loopback interface
auto lo
iface lo inet loopback

# interface "eth0"
auto eth0
iface eth0 inet manual
    pre-up ip address add 192.168.100.152/32 dev eth0 &> /dev/null || true
    up ip route replace 192.168.100.1 dev eth0
    up ip route replace default via 192.168.100.1
    down ip route del default via 192.168.100.1 &> /dev/null || true
    down ip route del 192.168.100.1 dev eth0 &> /dev/null || true
    post-down ip address del 192.168.100.152/32 dev eth0 &> /dev/null || true
    dns-nameserver 192.168.100.200
    dns-domain maas

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-20:

#21

Download full text (5.6 KiB)

Right - this (I think) was my observation in comment #16.

So, even if the device is in /etc/network/interfaces /etc/resolv.conf is
still borked.

On 19 December 2015 at 19:07, Cheryl Jennings <<email address hidden>
> wrote:

> I think the root of the problem that's being seen in CI (where
> containers can't resolve node0.maas), is that we are manually specifying
> the network config (rather than using dhcp), but we don't manually
> specify the dns information.
>
> I've attached the two cloud-config.txt files, one from a 1.25.0 env, and
> one from 1.25.2 env. We specify a manual config for eth0 in 1.25.2, but
> don't include dns information.
>
> For the 1.25.2 case, I added dns-nameserver and dns-domain into
> /etc/network/interfaces and did an ifdown eth0 && ifup eth0, and was
> then able to resolve node0.maas. /etc/resolv.conf also had the correct
> information. Here's the modified /e/n/i:
>
> root@juju-machine-0-lxc-0:~# cat /etc/network/interfaces
>
> # loopback interface
> auto lo
> iface lo inet loopback
>
> # interface "eth0"
> auto eth0
> iface eth0 inet manual
> pre-up ip address add 192.168.100.152/32 dev eth0 &> /dev/null || true
> up ip route replace 192.168.100.1 dev eth0
> up ip route replace default via 192.168.100.1
> down ip route del default via 192.168.100.1 &> /dev/null || true
> down ip route del 192.168.100.1 dev eth0 &> /dev/null || true
> post-down ip address del 192.168.100.152/32 dev eth0 &> /dev/null ||
> true
> dns-nameserver 192.168.100.200
> dns-domain maas
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1525280
>
> Title:
> 1.25.1 with maas 1.8: devices dns allocation uses non-unique hostname
>
> Status in juju-core:
> Triaged
> Status in juju-core 1.25 series:
> Fix Committed
>
> Bug description:
> Starting with juju 1.25.1, juju will by default use MAAS devices for
> container IPs. When making the device ip reservation, juju passes
> along the container hostname. Unfortunately that container name is far
> from unique: it obeys the form juju-machine-N-lxc-M. That means all
> subsequent containers, from other environments on the same MAAS
> server, that get the same juju-machine-N-lxc-M hostname will resolve
> to an IP that is not their own.
>
> Example:
> a) User 1:
> bootstrap, status:
> environment: beretstack
> machines:
> "0":
> agent-state: started
> agent-version: 1.25.1
> dns-name: tesla.beretstack
> instance-id:
> /MAAS/api/1.0/nodes/node-1742ff26-4b4a-11e4-ad24-a0b3cce4ecca/
> series: trusty
> hardware: arch=amd64 cpu-cores=4 mem=16384M
> state-server-member-status: has-vote
> services: {}
>
> juju deploy ubuntu --to lxc:0
> $ juju deploy ubuntu --to lxc:0
> Added charm "cs:trusty/ubuntu-5" to the environment.
>
> maas log:
> Dec 11 08:41:29 virtue maas.api: [INFO] juju-machine-0-lxc-0: Added new
> device
> Dec 11 08:41:30 virtue maas.api: [INFO] juju-machine-0-lxc-0: Sticky IP
> address(es) allocated: 10.1.102.145
> Dec 11 08:41:30 virtue maas.dns: [INFO] Generating new DNS zone file for
> be...

Right - this (I think) was my observation in comment #16.

So, even if the device is in /etc/network/interfaces /etc/resolv.conf is
still borked.

On 19 December 2015 at 19:07, Cheryl Jennings <cheryl.jennings@canonical.com
> wrote:

> I think the root of the problem that's being seen in CI (where
> containers can't resolve node0.maas), is that we are manually specifying
> the network config (rather than using dhcp), but we don't manually
> specify the dns information.
>
> I've attached the two cloud-config.txt files, one from a 1.25.0 env, and
> one from 1.25.2 env.  We specify a manual config for eth0 in 1.25.2, but
> don't include dns information.
>
> For the 1.25.2 case, I added dns-nameserver and dns-domain into
> /etc/network/interfaces and did an ifdown eth0 && ifup eth0, and was
> then able to resolve node0.maas.  /etc/resolv.conf also had the correct
> information.  Here's the modified /e/n/i:
>
> root@juju-machine-0-lxc-0:~# cat /etc/network/interfaces
>
> # loopback interface
> auto lo
> iface lo inet loopback
>
> # interface "eth0"
> auto eth0
> iface eth0 inet manual
>     pre-up ip address add 192.168.100.152/32 dev eth0 &> /dev/null || true
>     up ip route replace 192.168.100.1 dev eth0
>     up ip route replace default via 192.168.100.1
>     down ip route del default via 192.168.100.1 &> /dev/null || true
>     down ip route del 192.168.100.1 dev eth0 &> /dev/null || true
>     post-down ip address del 192.168.100.152/32 dev eth0 &> /dev/null ||
> true
>     dns-nameserver 192.168.100.200
>     dns-domain maas
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1525280
>
> Title:
>   1.25.1 with maas 1.8: devices dns allocation uses non-unique hostname
>
> Status in juju-core:
>   Triaged
> Status in juju-core 1.25 series:
>   Fix Committed
>
> Bug description:
>   Starting with juju 1.25.1, juju will by default use MAAS devices for
>   container IPs. When making the device ip reservation, juju passes
>   along the container hostname. Unfortunately that container name is far
>   from unique: it obeys the form juju-machine-N-lxc-M. That means all
>   subsequent containers, from other environments on the same MAAS
>   server, that get the same juju-machine-N-lxc-M hostname will resolve
>   to an IP that is not their own.
>
>   Example:
>   a) User 1:
>   bootstrap, status:
>   environment: beretstack
>   machines:
>     "0":
>       agent-state: started
>       agent-version: 1.25.1
>       dns-name: tesla.beretstack
>       instance-id:
> /MAAS/api/1.0/nodes/node-1742ff26-4b4a-11e4-ad24-a0b3cce4ecca/
>       series: trusty
>       hardware: arch=amd64 cpu-cores=4 mem=16384M
>       state-server-member-status: has-vote
>   services: {}
>
>   juju deploy ubuntu --to lxc:0
>   $ juju deploy ubuntu --to lxc:0
>   Added charm "cs:trusty/ubuntu-5" to the environment.
>
>   maas log:
>   Dec 11 08:41:29 virtue maas.api: [INFO] juju-machine-0-lxc-0: Added new
> device
>   Dec 11 08:41:30 virtue maas.api: [INFO] juju-machine-0-lxc-0: Sticky IP
> address(es) allocated: 10.1.102.145
>   Dec 11 08:41:30 virtue maas.dns: [INFO] Generating new DNS zone file for
> beretstack
>   Dec 11 08:41:33 virtue maas.dns: [INFO] Generating new DNS zone file for
> 1.10.in-addr.arpa
>
>   maas DNS:
>   # grep -r juju-machine /etc/bind
>   /etc/bind/maas/zone.1.10.in-addr.arpa:145.102.1.10.in-addr.arpa. IN PTR
> juju-machine-0-lxc-0.beretstack.
>   /etc/bind/maas/zone.beretstack:juju-machine-0-lxc-0 IN A 10.1.102.145
>
>   on ubuntu/0 at this point in time, we can see that the IP of
>   $(hostname -f) resolves to itself:
>
>   ubuntu@juju-machine-0-lxc-0:~$ ip addr show dev eth0
>   9: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP group default qlen 1000
>       link/ether 00:16:3e:c2:82:7a brd ff:ff:ff:ff:ff:ff
>       inet 10.1.102.145/16 brd 10.1.255.255 scope global eth0
>          valid_lft forever preferred_lft forever
>       inet6 fe80::216:3eff:fec2:827a/64 scope link tentative dadfailed
>          valid_lft forever preferred_lft forever
>
>   hostname -f:
>   $ hostname -f
>   juju-machine-0-lxc-0.beretstack
>
>   Which resolves back to itself as expected:
>   ubuntu@juju-machine-0-lxc-0:~$ host juju-machine-0-lxc-0.beretstack
>   juju-machine-0-lxc-0.beretstack has address 10.1.102.145
>
>   b) Now User 2 comes along and does the same thing:
>   after juju deploy ubuntu --to lxc:0
>
>   You can see that the IP of $(hostname -f) resolves back to the ubuntu/0
> unit that belongs to User 1:
>   ubuntu@juju-machine-0-lxc-0:~$ ip addr
>   1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> group default
>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>       inet 127.0.0.1/8 scope host lo
>          valid_lft forever preferred_lft forever
>       inet6 ::1/128 scope host
>          valid_lft forever preferred_lft forever
>   9: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP group default qlen 1000
>       link/ether 00:16:3e:c2:70:f2 brd ff:ff:ff:ff:ff:ff
>       inet 10.1.80.164/16 brd 10.1.255.255 scope global eth0
>          valid_lft forever preferred_lft forever
>       inet6 fe80::216:3eff:fec2:70f2/64 scope link
>          valid_lft forever preferred_lft forever
>
>   ubuntu@juju-machine-0-lxc-0:~$ hostname -f
>   juju-machine-0-lxc-0.beretstack
>
>   ubuntu@juju-machine-0-lxc-0:~$ host juju-machine-0-lxc-0.beretstack
>   juju-machine-0-lxc-0.beretstack has address 10.1.102.145
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1525280/+subscriptions
>

-- 
Andrew McDermott <andrew.mcdermott@canonical.com>
Juju Core Sapphire team <http://juju.ubuntu.com>

Revision history for this message

John A Meinel (jameinel) wrote on 2015-12-20:

#22

So can we compare how MAAS is setting up /e/n/i on the physical machine with how we are setting up the container?

I'm a bit surprised we need DNS rather than just using IP addresses, but we can go with that.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-20:

#23

/e/n/i on the physical node contains dns information:
ubuntu@node1:/var/lib/cloud/instance$ cat /etc/network/interfaces
iface eth0 inet manual

auto juju-br0
iface juju-br0 inet static
    bridge_ports eth0
    gateway 192.168.100.1
    address 192.168.100.151/24
    mtu 1500

dns-nameservers 192.168.100.200
dns-search maas

And we need DNS on the container as it cannot resolve *anything*:
root@juju-machine-1-lxc-0:/# ping ubuntu.com
ping: unknown host ubuntu.com

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-20:

#24

Given a recent conversation with Cheryl I'm chasing the wrong issue; the lack of a device name in /e/n/i (and very intermittent in my case) isn't the immediate issue to fix. It's the lack of a DNS nameserver (and dns-search) entry in /etc/resolv.conf that is the breakage that needs fixing immediately. Looking through the patch I see a switch to static IP address allocation which should explain why we no longer get DNS entries.

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-20:

#25

Note: with the address allocation feature enabled you do get nameservers listed in /etc/resolv.conf. Remember to bootstrap like so to enable that feature:

$ JUJU_DEV_FEATURE_FLAGS=address-allocation juju bootstrap ...

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-12-20:

#26

Not sure of the significance but prior to this change calls to PrepareContainerInterfaceInfo returned:

all-machines.log
1467:machine-0: 2015-12-20 23:25:08 TRACE juju.provisioner.lxc lxc-broker.go:705 PrepareContainerInterfaceInfo returned []network.InterfaceInfo{}

whereas it now returns:

1751:2015-12-20 22:25:22 TRACE juju.provisioner.lxc lxc-broker.go:705 PrepareContainerInterfaceInfo returned []network.InterfaceInfo{network.InterfaceInfo{DeviceIndex:0, MACAddress:"00:16:3e:be:d9:c1", CIDR:"", NetworkName:"", ProviderId:"", ProviderSubnetId:"", AvailabilityZones:[]string(nil), VLANTag:0, InterfaceName:"eth0", Disabled:false, NoAutoStart:false, ConfigType:"static", Address:local-cloud:10.17.20.212, DNSServers:[]network.Address(nil), DNSSearch:"", GatewayAddress:local-cloud:10.17.20.1, ExtraConfig:map[string]string(nil)}}

So in the last case it's clear that there is no DNS info, but there isn't in the first case either. Perhaps in the light of no results (empty slice, fist case) it takes a different path...

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-12-21:

#27

Opened bug #1528217 for the new DNS issue found in CI.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2016-01-07:

#28

Is this bug targeted to be merged into master for 2.0-alpha1? (cut off is next week)

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-01-07:

#29

Download full text (4.3 KiB)

OK will do this tomorrow.

On 7 January 2016 at 17:48, Cheryl Jennings <email address hidden>
wrote:

> Is this bug targeted to be merged into master for 2.0-alpha1? (cut off
> is next week)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1525280
>
> Title:
> 1.25.1 with maas 1.8: devices dns allocation uses non-unique hostname
>
> Status in juju-core:
> Triaged
> Status in juju-core 1.25 series:
> Fix Committed
>
> Bug description:
> Starting with juju 1.25.1, juju will by default use MAAS devices for
> container IPs. When making the device ip reservation, juju passes
> along the container hostname. Unfortunately that container name is far
> from unique: it obeys the form juju-machine-N-lxc-M. That means all
> subsequent containers, from other environments on the same MAAS
> server, that get the same juju-machine-N-lxc-M hostname will resolve
> to an IP that is not their own.
>
> Example:
> a) User 1:
> bootstrap, status:
> environment: beretstack
> machines:
> "0":
> agent-state: started
> agent-version: 1.25.1
> dns-name: tesla.beretstack
> instance-id:
> /MAAS/api/1.0/nodes/node-1742ff26-4b4a-11e4-ad24-a0b3cce4ecca/
> series: trusty
> hardware: arch=amd64 cpu-cores=4 mem=16384M
> state-server-member-status: has-vote
> services: {}
>
> juju deploy ubuntu --to lxc:0
> $ juju deploy ubuntu --to lxc:0
> Added charm "cs:trusty/ubuntu-5" to the environment.
>
> maas log:
> Dec 11 08:41:29 virtue maas.api: [INFO] juju-machine-0-lxc-0: Added new
> device
> Dec 11 08:41:30 virtue maas.api: [INFO] juju-machine-0-lxc-0: Sticky IP
> address(es) allocated: 10.1.102.145
> Dec 11 08:41:30 virtue maas.dns: [INFO] Generating new DNS zone file for
> beretstack
> Dec 11 08:41:33 virtue maas.dns: [INFO] Generating new DNS zone file for
> 1.10.in-addr.arpa
>
> maas DNS:
> # grep -r juju-machine /etc/bind
> /etc/bind/maas/zone.1.10.in-addr.arpa:145.102.1.10.in-addr.arpa. IN PTR
> juju-machine-0-lxc-0.beretstack.
> /etc/bind/maas/zone.beretstack:juju-machine-0-lxc-0 IN A 10.1.102.145
>
> on ubuntu/0 at this point in time, we can see that the IP of
> $(hostname -f) resolves to itself:
>
> ubuntu@juju-machine-0-lxc-0:~$ ip addr show dev eth0
> 9: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP group default qlen 1000
> link/ether 00:16:3e:c2:82:7a brd ff:ff:ff:ff:ff:ff
> inet 10.1.102.145/16 brd 10.1.255.255 scope global eth0
> valid_lft forever preferred_lft forever
> inet6 fe80::216:3eff:fec2:827a/64 scope link tentative dadfailed
> valid_lft forever preferred_lft forever
>
> hostname -f:
> $ hostname -f
> juju-machine-0-lxc-0.beretstack
>
> Which resolves back to itself as expected:
> ubuntu@juju-machine-0-lxc-0:~$ host juju-machine-0-lxc-0.beretstack
> juju-machine-0-lxc-0.beretstack has address 10.1.102.145
>
> b) Now User 2 comes along and does the same thing:
> after juju deploy ubuntu --to lxc:0
>
> You can see that the IP of $(hostname -f) resolves back to the ubuntu/0
> unit th...

affects:	juju-core → juju
Changed in juju:
milestone:	2.0-alpha2 → none
milestone:	none → 2.0-alpha2

Canonical Juju

1.25.1 with maas 1.8: devices dns allocation uses non-unique hostname

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
Canonical Juju	Fix Released	High	Andrew McDermott	Canonical Juju 2.0-alpha2
juju-core	Fix Released	Critical	Dimiter Naydenov
1.25	Fix Released	Critical	Dimiter Naydenov	juju-core 1.25.2

Changed in juju-core:
assignee:	nobody → Andrew McDermott (frobware)

Changed in juju-core:
milestone:	2.0-alpha1 → 2.0-alpha2

Changed in juju-core:
status:	In Progress → Fix Committed

Changed in juju-core:
status:	Fix Committed → Fix Released