Landscape Server

LDS 15.04 - OpenStack - lxc fails to retrieve tmpl to clone

Bug #1451385 reported by Alvaro Uria on 2015-05-04

This bug report is a duplicate of: Bug #1442801: aws containers are broken in 1.23. Edit Remove

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Landscape Server	Incomplete	Undecided	Unassigned
	juju-core	Triaged	High	Unassigned

Bug Description

Hello,

I'm testing an OpenStack deployment with LDS 15.04 (on trusty). After selecting software to be used on nodes (compute, network, object storage...), and nodes themselves, installation stalls as "in-progress" at 86% due to:

agent-state-info: 'failed to retrieve the template to clone: template container
"juju-trusty-lxc-template" did not stop'

juju-sync log only shows "error" on failed lxcs: http://pastebin.ubuntu.com/10983637/

If I juju ssh 0 and list LXC containers, juju-trusty-lxc-template seems stopped: http://pastebin.ubuntu.com/10983646/

Please let me know if I can provide more details.

Best,
-Alvaro.

Tags:

Revision history for this message

Alvaro Uria (aluria) wrote on 2015-05-04:

(juju status) output Edit (25.9 KiB, text/plain)

Revision history for this message

Alvaro Uria (aluria) wrote on 2015-05-05:

This happened on Juju 1.22.1-0ubuntu1~14.04.1~juju1 (also detailed in #1348386).

Juju 1.20 can't be used because "juju api-info" is not supported.

Juju 1.23.2 fails to set connectivity on bootstrap node both with 1) same interfaces file as with the other juju version, 2) a modified interfaces file with just auto br0; iface br0 inet dhcp configured (no veth or juju-br0 interfaces as they should be automatically managed by juju).

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-05-06:

I have seen "juju-trusty-lxc-template" did not stop' before. I managed to login on the host before the error was signaled and I saw that the template container had a stuck apt-get process. It stalled on the network. I don't remember the exact reason, but I think it was related to MTU.

Can you deploy to the MAAS nodes normally using just juju, and then also deploy to containers inside a node? Can you give it a quick try, without involving the autopilot? Maybe just:

juju bootstrap (in the MAAS env)
juju deploy ubuntu
juju deploy ubuntu --to lxc:1

See if you get an ubuntu node, and an ubuntu lxc inside it.

Also, if you still have that failed deployment up, did it timeout by now and give you a link/button to file a bug report? That would give us a lot of logs about this.

Changed in landscape:
status:	New → Incomplete

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2015-05-06:

How is your MAAS configured regarding network ranges? Can you tell us what you have for the dynamic range, static range, and the network?

Revision history for this message

Alberto Donato (ack) wrote on 2015-05-06:

all-machines.log Edit (31.2 KiB, text/plain)

I had the same error when adding a vivid LXC to a vivid host with juju 1.23.2, by running "juju add-machine lxc:0" after bootstrap:

environment: cstack
machines:
  "0":
    agent-state: started
    agent-version: 1.23.2
    dns-name: 10.55.32.244
    instance-id: c238418e-e2d4-49a4-a9c1-ddd0747c99e9
    instance-state: ACTIVE
    series: vivid
    containers:
      0/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-vivid-lxc-template" did not stop'
        instance-id: pending
        series: vivid
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=10240M availability-zone=nova
    state-server-member-status: has-vote
services: {}

After the error, I ssh'd to the boostrap node. The template container was still running and I could attach to it.
Stopping it with lxc-stop worked, but took a long time.

Revision history for this message

Alberto Donato (ack) wrote on 2015-05-06:

juju container.log for the template container Edit (1018.8 KiB, text/plain)

information type:

Proprietary → Public

Revision history for this message

Alvaro Uria (aluria) wrote on 2015-05-07:

#3, direct execution of "juju bootstrap" and "juju deploy ubuntu" and "--to lxc:1" worked fine. I had to previously add "disable-network-management: true" in environments.yaml.

I did the same for Landscape to bootstrap node (see http://pastebin.ubuntu.com/11010187/).

As a result, I had all nodes in agent-state: started but none of them actually started to get services deployed into.

Please see all-machines.log on bootstrap node (in which lxc packages haven't been installed): http://pastebin.ubuntu.com/11010163/

Installation is now stucked just after "bootstrap" step is finished (and "deploy <servicename>" is "in-progress"), at 0%.

#4, MAAS is configured such as:
1) All nodes have a mac address on private network
2) 2 openstack-ha tagged nodes have a second mac address to the same private network
3) MAAS is the gateway node, doing SNAT (and ip_forward enabled).

/etc/network/interfaces are passed via curtin (preseed), using:
- br0 -> bond0 -> eth0, eth2 (private lan)

Cheers,
-Alvaro.

Revision history for this message

Alvaro Uria (aluria) wrote on 2015-05-07:

Current "juju status" with stalled 0% progress (all services deployment stalled): http://pastebin.ubuntu.com/11010677/

Curtis Hovey (sinzui) on 2015-05-08

tags:	added: lxc
Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 1.25.0
milestone:	1.25.0 → none

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1442801 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.