Containers fail to get ip when non-maas dhcp/dns is used

Bug #1466629 reported by james beedy on 2015-06-18
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
High
Unassigned

Bug Description

When non-maas dhcp/dns is used, services deployed to containers fail to get ip addresses and fail deployment. This completely blocks deploying any openstack services on containers.

I am running 14.04.2 with juju version 1.24.0-trusty-amd64, and maas version 1.7.5.

My juju status shows: http://paste.ubuntu.com/11737084/

Issue created with openstack-installer: https://github.com/Ubuntu-Solutions-Engineering/openstack-installer/issues/627

james beedy (jamesbeedy) on 2015-06-18
description: updated
Tim Penhey (thumper) on 2015-06-18
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
Tim Penhey (thumper) wrote :

James, there should be some log files for the containers on the MAAS provisioned machines here:

  /var/lib/juju/containers/<container-name>/...

Can we get those added to this bug?

james beedy (jamesbeedy) wrote :

Tim,

The only log files I have on my container host are:

/var/log/juju/machine-1.log: http://paste.ubuntu.com/11737445/

/var/lib/juju/containers/juju-machine-1-lxc-0/cloud-init: http://paste.ubuntu.com/11737446/
/var/lib/juju/containers/juju-machine-1-lxc-1/cloud-init: http://paste.ubuntu.com/11737447/

james beedy (jamesbeedy) wrote :

There are another 7 log files for the other 7 containers, if needed I can post those too....although they show no difference from the two I posted other than the machine and service name.

james beedy (jamesbeedy) wrote :

I guess there aren't even log files in /var/lib/juju/containers ... only cloud-inits ...my bad

Wes (tapp-wes) on 2015-06-19
Changed in juju-core:
milestone: none → 1.25.0
Dimiter Naydenov (dimitern) wrote :

If MAAS is not managing DHCP for the nodes, how do they get an address?
If the nodes come up OK (with DHCP allocated addresses), but containers do not, then it's most likely an issue on the juju side, not maas.

Can you provide and attack here the logs from /var/log/juju/*?
What's your environments.yaml contents (scrubbed of secrets, if any please)?

james beedy (jamesbeedy) wrote :

Dimiter,

Thanks for asking. DHCP is being served on 10.16.100.0/24 alongside other networks by a set of pfsense servers. DNS is served internally by a bind9 cluster with an authoritative master and two slaves. When my dhcp server hands out an ip address to a client it also updates my authoritative DNS server with the appropriate forward and reverse records for the zone, following which, the authoritative master transfers the zone to each of the slaves. I obtain successful 'host' queries by hostname, fqdn, and ip address immediately after a node is issued its ip address by the dhcp sever in enlistment, commissioning, and deploy phases.

I can deploy a test momentarily to reproduce and grab some more insight.... My environments.yaml is as follows:

default: maas

environments:
  maas:
    type: maas
    maas-server: 'http://10.16.100.10/MAAS/'
    maas-oauth: ***
    admin-secret: "***"
    default-series: trusty
    authorized-keys-path: ~/.ssh/id_rsa.pub
    apt-http-proxy: 'http://10.16.100.10:8000/'
    lxc-clone: true
    bootstrap-timeout: 3600
    no-proxy: localhost,10.16.100.10

james beedy (jamesbeedy) wrote :

As a test, using the above environments.yaml from an un-bootstraped env, I ran a "juju bootstrap -e maas", then "juju add-machine ceph-mon-1.tfawint.com".

At this point "juju status" shows: http://paste.ubuntu.com/11765699/

Next I run "juju deploy mysql --to lxc:1", after which my juju status indicates: http://paste.ubuntu.com/11765711/

bootstrap node:
    all-machines.log : http://paste.ubuntu.com/11765715/
    machine-0.log: http://paste.ubuntu.com/11765718/

machine 1:
    machine-0.log: http://paste.ubuntu.com/11765722/

maas-server:
    maas.log: http://paste.ubuntu.com/11765736/

james beedy (jamesbeedy) wrote :

I feel like this is at the heart of the issue:

ERROR: certificate common name ''*'' doesn''t match requested host name ''10.16.100.58''.; To connect to 10.16.100.58 insecurely, use `--no-check-certificate''.;

james beedy (jamesbeedy) wrote :

/var/log/maas/pserv.log shows: http://paste.ubuntu.com/11768276/

james beedy (jamesbeedy) wrote :

per ^ it looks like tftp is only being served on the default virsh net of 192.168.122.0/24...... possibly a different issue for sure though.

Dimiter Naydenov (dimitern) wrote :

That indeed seems the core of the problem - not being able to download the cloud image for the lxc container. Additionally, looking at the logs it seems odd thet API hostports contain the same address 4 times - ceph-osd-1.tfawint.com.

The error about the certificate leads me to think the certupdater worker hasn't done its job to recreate the certificate to include the bootstrap node's addresses (rather than the default '*').

Why do you need to specify the MAAS cluster controller as APT proxy? MAAS already does that transparently (when it works - I had to change the squid proxy config on my MAAS to make it work on port 8000). You might try dropping the apt-http-proxy and no-proxy lines from your environments.yaml and retry.

To simplify the lxc deployment process, also try using lxc-clone: false to see if that makes a difference.

Logs from both machines show errors trying to set instance status, which is disturbing, but I doubt that's the last thing in the log.

In any case, try the suggestions with logging-config: <root>=TRACE and pass --debug to bootstrap to get more context.
Please note with debug and trace logging there *will be* secrets/keys in the logs you'll want to scrub before attaching.

Curtis Hovey (sinzui) on 2015-07-23
tags: added: network
Curtis Hovey (sinzui) on 2015-08-13
tags: added: bug-squad
Curtis Hovey (sinzui) on 2015-08-27
Changed in juju-core:
milestone: 1.25-alpha1 → 1.25-beta1

Please provide the logs requested in comment #11

Changed in juju-core:
status: Triaged → Incomplete
Curtis Hovey (sinzui) on 2015-09-29
Changed in juju-core:
milestone: 1.25-beta1 → 1.25-beta2
Changed in juju-core:
milestone: 1.25-beta2 → 1.25.1
Changed in juju-core:
milestone: 1.25.1 → 1.26.0
james beedy (jamesbeedy) wrote :

My apologies for leaving this bug unattended. I haven't reproduced the environment (including the way dhcp,dns, and maas were configured together) since I abandoned the configuration detailed above. That being said, I would like to help get this functionality working to some degree...... I intend on setting up a virtual stack dedicated to following up on this. I'll report back soon! Thanks!

Cheryl Jennings (cherylj) wrote :

I wonder if the lxc wget cert issues are related to bug #1512782

Changed in juju-core:
milestone: 1.26.0 → 2.0-beta5
Changed in juju-core:
milestone: 2.0-beta5 → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → none
Launchpad Janitor (janitor) wrote :

[Expired for juju-core because there has been no activity for 60 days.]

Changed in juju-core:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers