LDS 15.04 - OpenStack - lxc fails to retrieve tmpl to clone

Bug #1451385 reported by Alvaro Uria
This bug report is a duplicate of:  Bug #1442801: aws containers are broken in 1.23. Edit Remove
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Landscape Server
Incomplete
Undecided
Unassigned
juju-core
Triaged
High
Unassigned

Bug Description

Hello,

I'm testing an OpenStack deployment with LDS 15.04 (on trusty). After selecting software to be used on nodes (compute, network, object storage...), and nodes themselves, installation stalls as "in-progress" at 86% due to:

        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'

juju-sync log only shows "error" on failed lxcs: http://pastebin.ubuntu.com/10983637/

If I juju ssh 0 and list LXC containers, juju-trusty-lxc-template seems stopped: http://pastebin.ubuntu.com/10983646/

Please let me know if I can provide more details.

Best,
-Alvaro.

Revision history for this message
Alvaro Uria (aluria) wrote :
Revision history for this message
Alvaro Uria (aluria) wrote :

This happened on Juju 1.22.1-0ubuntu1~14.04.1~juju1 (also detailed in #1348386).

Juju 1.20 can't be used because "juju api-info" is not supported.

Juju 1.23.2 fails to set connectivity on bootstrap node both with 1) same interfaces file as with the other juju version, 2) a modified interfaces file with just auto br0; iface br0 inet dhcp configured (no veth or juju-br0 interfaces as they should be automatically managed by juju).

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have seen "juju-trusty-lxc-template" did not stop' before. I managed to login on the host before the error was signaled and I saw that the template container had a stuck apt-get process. It stalled on the network. I don't remember the exact reason, but I think it was related to MTU.

Can you deploy to the MAAS nodes normally using just juju, and then also deploy to containers inside a node? Can you give it a quick try, without involving the autopilot? Maybe just:

juju bootstrap (in the MAAS env)
juju deploy ubuntu
juju deploy ubuntu --to lxc:1

See if you get an ubuntu node, and an ubuntu lxc inside it.

Also, if you still have that failed deployment up, did it timeout by now and give you a link/button to file a bug report? That would give us a lot of logs about this.

Changed in landscape:
status: New → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

How is your MAAS configured regarding network ranges? Can you tell us what you have for the dynamic range, static range, and the network?

Revision history for this message
Alberto Donato (ack) wrote :

I had the same error when adding a vivid LXC to a vivid host with juju 1.23.2, by running "juju add-machine lxc:0" after bootstrap:

environment: cstack
machines:
  "0":
    agent-state: started
    agent-version: 1.23.2
    dns-name: 10.55.32.244
    instance-id: c238418e-e2d4-49a4-a9c1-ddd0747c99e9
    instance-state: ACTIVE
    series: vivid
    containers:
      0/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-vivid-lxc-template" did not stop'
        instance-id: pending
        series: vivid
    hardware: arch=amd64 cpu-cores=1 mem=1024M root-disk=10240M availability-zone=nova
    state-server-member-status: has-vote
services: {}

After the error, I ssh'd to the boostrap node. The template container was still running and I could attach to it.
Stopping it with lxc-stop worked, but took a long time.

Revision history for this message
Alberto Donato (ack) wrote :
information type: Proprietary → Public
Revision history for this message
Alvaro Uria (aluria) wrote :

#3, direct execution of "juju bootstrap" and "juju deploy ubuntu" and "--to lxc:1" worked fine. I had to previously add "disable-network-management: true" in environments.yaml.

I did the same for Landscape to bootstrap node (see http://pastebin.ubuntu.com/11010187/).

As a result, I had all nodes in agent-state: started but none of them actually started to get services deployed into.

Please see all-machines.log on bootstrap node (in which lxc packages haven't been installed): http://pastebin.ubuntu.com/11010163/

Installation is now stucked just after "bootstrap" step is finished (and "deploy <servicename>" is "in-progress"), at 0%.

#4, MAAS is configured such as:
1) All nodes have a mac address on private network
2) 2 openstack-ha tagged nodes have a second mac address to the same private network
3) MAAS is the gateway node, doing SNAT (and ip_forward enabled).

/etc/network/interfaces are passed via curtin (preseed), using:
- br0 -> bond0 -> eth0, eth2 (private lan)

Cheers,
-Alvaro.

Revision history for this message
Alvaro Uria (aluria) wrote :

Current "juju status" with stalled 0% progress (all services deployment stalled): http://pastebin.ubuntu.com/11010677/

Curtis Hovey (sinzui)
tags: added: lxc
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.0
milestone: 1.25.0 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.