Bug #1488245 “Recurring lxc issue: failed to retrieve the templa...” : Bugs : juju-core

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-08-24:

#1

It would make sense that this would always fail for the first container, as what's failing is the download of the image to use in creating the template container.

Other containers which are started will see that the template doesn't exist, and try to create it. The error, specifically, is from Juju using the ubuntu-cloudimg-query command to get the url of the correct image to download. I wonder if there's some flakyness with cloud-images.ubuntu.com. I'll ask the CPC team about that.

It could be helpful to retry the ubuntu-cloudimg-query command if it fails, with some modest wait time.

Curtis Hovey (sinzui) on 2015-08-25

tags:	added: lxc
Changed in juju-core:
status:	New → Incomplete

Revision history for this message

Peter Sabaini (peter-sabaini) wrote on 2015-09-01:

#2

I've hit this bug repeatedly today. What can I do to get some traction on this?

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-11-01:

#3

[Expired for juju-core because there has been no activity for 60 days.]

Changed in juju-core:
status:	Incomplete → Expired

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2016-05-16:

#4

We got such a case reported today. It consistently fails with this error on 32 core nodes, but works on 8 core ones.

What logs would you like to have?

Changed in juju-core:
status:	Expired → Confirmed
tags:	added: kanban-cross-team

Cheryl Jennings (cherylj) on 2016-05-16

Changed in juju-core:
status:	Confirmed → New

🤖 Landscape Builder (landscape-builder) on 2016-05-16

tags:

removed: kanban-cross-team

Andreas Hasenack (ahasenack) on 2016-05-17

tags:

added: kanban-crossoteam

Andreas Hasenack (ahasenack) on 2016-05-18

tags:

added: kanban-crossteam
removed: kanban-crossoteam

Adam Collard (adam-collard) on 2016-05-18

tags:	removed: kanban-crossteam
tags:	added: landscapee
tags:	added: landscape removed: landscapee

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2016-05-18:

#5

machine-0.log Edit (2.8 MiB, text/plain)

Download full text (3.3 KiB)

Here is a machine-0.log file from a juju environment with debug enabled where this failed.

This was the bootstrap + deploy command:
juju bootstrap --to <hostname> --debug 2>&1 | tee bootstrap.log; juju deploy cs:trusty/ubuntu --to lxc:0; for n in $(seq 1 10); do juju add-unit ubuntu --to lxc:0; done

The first two LXC failed to come up (ubuntu/0 and ubuntu/1). You can see in the log that juju tried to create the lxc template twice and failed. It finally worked on the third attempt:

First 0/lxc/0:
2016-05-17 21:19:45 INFO juju.provisioner.lxc lxc-broker.go:88 starting lxc container for machineId: 0/lxc/0
(...)
2016-05-17 21:19:45 INFO juju.container.lxc clonetemplate.go:32 wait for flock on juju-trusty-lxc-template
2016-05-17 21:19:45 INFO juju.container lock.go:50 acquire lock "juju-trusty-lxc-template", ensure clone exists
2016-05-17 21:19:45 INFO juju.container.lxc clonetemplate.go:77 template does not exist, creating
(...)

Now 0/lxc/1:
2016-05-17 21:20:06 INFO juju.provisioner.lxc lxc-broker.go:88 starting lxc container for machineId: 0/lxc/1
(...)
2016-05-17 21:20:06 INFO juju.container lock.go:66 release lock "juju-trusty-lxc-template"
2016-05-17 21:20:06 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: failed to retrieve the template to clone: cannot determine cached image URL: cannot determine LXC image URL: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
: exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
: exit status 1
(...)
2016-05-17 21:20:06 INFO juju.container.lxc clonetemplate.go:32 wait for flock on juju-trusty-lxc-template
2016-05-17 21:20:06 INFO juju.container lock.go:50 acquire lock "juju-trusty-lxc-template", ensure clone exists
2016-05-17 21:20:06 INFO juju.container.lxc clonetemplate.go:77 template does not exist, creating
2016-05-17 21:20:06 DEBUG juju.service discovery.go:65 discovered init system "upstart" from series "trusty"
2016-05-17 21:20:16 INFO juju.container lock.go:66 release lock "juju-trusty-lxc-template"
2016-05-17 21:20:16 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: failed to retrieve the template to clone: cannot determine cached image URL: cannot determine LXC image URL: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
: exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
: exit status 1
(...)

For 0/lxc/2 onwards it worked:
2016-05-17 21:20:17 INFO juju.provisioner.lxc lxc-broker.go:88 starting lxc container for machineId: 0/lxc/2
(...)
2016-05-17 21:20:17 INFO juju.container.lxc clonetemplate.go:32 wait for flock on juju-trusty-lxc-template
2016-05-17 21:20:17 INFO juju.container lock.go:50 acquire lock "juju-trusty-lxc-template", ensure clone exists
2016-05-17 21:20:17 INFO juju.container.lxc clonetemplate.go:77 template does not exist, creating
2016-05-17 21:20:17 DEBUG juju.service discovery.go:65 discovered init system "upstart" f...

Hi Chuck,

I enabled IPv6 here in my network and registered some nodes with a local MAAS server.

I was able to reproduce the hostname-without-domain issue we had before (bug #1581707) when IPv6 was enabled in your network:

root@15-89:~# python -c "import socket; print socket.gethostbyaddr(socket.gethostname())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
socket.herror: [Errno 1] Unknown host

root@15-89:~# python -c "import socket; print socket.getfqdn()"
15-89
root@15-89:~# hostname
15-89
root@15-89:~# hostname -f
15-89.maaspxe5

IP addresses in that 15-89 node (eth2 has a global IPv6 address):
$ for n in 0 1 2; do juju ssh 0 ip a show dev eth$n; done 2>/dev/null
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master juju-br0 state UP group default qlen 1000
    link/ether 52:54:00:ab:15:89 brd ff:ff:ff:ff:ff:ff
    inet 10.0.5.150/24 brd 10.0.5.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:fa:49:0f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fefa:490f/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:23:f7:09 brd ff:ff:ff:ff:ff:ff
    inet6 fdde:a571:33af:0:5054:ff:fe23:f709/64 scope global dynamic 
       valid_lft forever preferred_lft forever
    inet6 2804:7f4:zzzz:xxxx:yyyy:ff:fe23:f709/64 scope global dynamic 
       valid_lft 40992sec preferred_lft 40992sec
    inet6 fe80::5054:ff:fe23:f709/64 scope link 
       valid_lft forever preferred_lft forever

I also got the incomplete DNS record in the MAAS server DNS:
# grep -E 15.89 -r /etc/bind
/etc/bind/maas/zone.5.0.10.in-addr.arpa:150.5.0.10.in-addr.arpa. IN PTR 15-89.maaspxe5.
/etc/bind/maas/zone.maaspxe5:15-89 IN A 10.0.5.150
/etc/bind/maas/zone.maaspxe5:15-89 IN AAAA 2804:7f4:zzzz:xxxx:yyyy:ff:fe23:f709

But I could deploy ubuntu in containers without hitting the problem from this bug:
$ juju status --format=tabular
[Services] 
NAME       STATUS  EXPOSED CHARM              
ubuntu     unknown false   cs:trusty/ubuntu-0

[Units]  
ID       WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE          
ubuntu/0 unknown        executing   1.25.5  0/lxc/0       10.0.5.11      (update-status)  
ubuntu/1 unknown        idle        1.25.5  0/lxc/1       10.0.5.14                       
ubuntu/2 unknown        idle        1.25.5  0/lxc/2       10.0.5.17

[Machines] 
ID         STATE   VERSION DNS            INS-ID                                                         SERIES HARDWARE                                                   
0          started 1.25.5  15-89.maaspxe5 /MAAS/api/1.0/nodes/node-7665d2aa-2111-11e6-bce2-00163ed66ea1/ trusty arch=amd64 cpu-cores=4 mem=1024M availability-zone=default

This datapoint of yours, however, that what changed between then and now is the IPv6 network, is interesting.

How is or was the connectivity of that IPv6 subnet you had? Would the wget command Cheryl pasted have worked over IPv6?

Note that cloud-images.ubuntu.com is IPv6 enabled:
$ host cloud-images.ubuntu.com
cloud-images.ubuntu.com has address 91.189.88.141
cloud-images.ubuntu.com has IPv6 address 2001:67c:1360:8001:ffff:ffff:ffff:fffe

See how wget in my MAAS node uses the IPv6 address:
ubuntu@15-89:~$ wget https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
--2016-05-28 20:36:32--  https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
Resolving cloud-images.ubuntu.com (cloud-images.ubuntu.com)... 2001:67c:1360:8001:ffff:ffff:ffff:fffe, 91.189.88.141
Connecting to cloud-images.ubuntu.com (cloud-images.ubuntu.com)|2001:67c:1360:8001:ffff:ffff:ffff:fffe|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 808 [text/plain]
Saving to: ‘released-dl.current.txt’

100%[===============================================================================================================================>] 808         --.-K/s   in 0s

2016-05-28 20:36:33 (9.72 MB/s) - ‘released-dl.current.txt’ saved [808/808]

Andreas Hasenack (ahasenack) on 2016-06-09

Changed in juju-core:
status:	Triaged → Incomplete

Curtis Hovey (sinzui) on 2016-06-16

Changed in juju-core:
milestone:	2.0-beta9 → 2.0-beta10

Revision history for this message

Larry Michel (lmic) wrote on 2016-06-23:

#13

We have been seeing this quite a bit today. It's been causing a lot of build failures for the past few hours:

  '5':
    agent-state: started
    agent-version: 1.25.5
    containers:
      5/lxc/0:
        agent-state: error
        agent-state-info: 'failed to retrieve the template to clone: cannot determine
          cached image URL: cannot determine LXC image URL: cannot determine LXC image
          URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

: exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

          : exit status 1'
        instance-id: pending
        series: trusty
      5/lxc/1:
        agent-state: error
        agent-state-info: 'failed to retrieve the template to clone: cannot determine
          cached image URL: cannot determine LXC image URL: cannot determine LXC image
          URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

: exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

          : exit status 1'
        instance-id: pending
        series: trusty
    dns-name: hayward-16.oil
    hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hardware-sm15k,hw-glance-sm15k,hw-production
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-a336d312-c4cd-11e3-824b-00163efc5068/
    series: trusty

tags:

added: oil

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2016-06-23:

#14

Larry - in your case, did all the containers on machine 5 fail with that error? Or just some?

Revision history for this message

Larry Michel (lmic) wrote on 2016-06-23:

#15

Download full text (11.9 KiB)

Cheryl, yes, all the containers. Here's another example with entire status:

environment: ci-oil-slave9
machines:
  '0':
    agent-state: started
    agent-version: 1.25.5
    dns-name: huffman-vm-06.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-juno,blacklist-icehouse,blacklist-precise,blacklist-compute,vm-ppc64el
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-617fecf2-a0e5-11e4-9cac-00163eca07b6/
    series: trusty
    state-server-member-status: has-vote
  '1':
    agent-state: started
    agent-version: 1.25.5
    containers:
      1/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.46.22
        hardware: arch=amd64
        instance-id: juju-machine-1-lxc-0
        series: trusty
      1/lxc/1:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.47.165
        hardware: arch=amd64
        instance-id: juju-machine-1-lxc-1
        series: trusty
    dns-name: ekans.oil
    hardware: arch=amd64 cpu-cores=32 mem=65536M tags=hw-production,hardware-dell-poweredge-R720XD
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-1490c87a-c615-11e5-b7b2-00163eca07b6/
    series: trusty
  '2':
    agent-state: started
    agent-version: 1.25.5
    containers:
      2/lxc/0:
        agent-state: error
        agent-state-info: 'failed to retrieve the template to clone: cannot determine
          cached image URL: cannot determine LXC image URL: cannot determine LXC image
          URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

: exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

          : exit status 1'
        instance-id: pending
        series: trusty
    dns-name: hayward-13.oil
    hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hardware-sm15k,hw-glance-sm15k,hw-production
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-9e308d0e-c4cd-11e3-8102-00163efc5068/
    series: trusty
  '3':
    agent-state: started
    agent-version: 1.25.5
    containers:
      3/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.43.102
        hardware: arch=ppc64el
        instance-id: juju-machine-3-lxc-0
        series: trusty
    dns-name: huffman-vm-01.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-juno,blacklist-icehouse,blacklist-precise,blacklist-compute,vm-ppc64el
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-5c8f2d56-9c45-11e4-bf57-00163eca07b6/
    series: trusty
  '4':
    agent-state: started
    agent-version: 1.25.5
    containers:
      4/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.33.145
        hardware: arch=ppc64el
        instance-id: juju-machine-4-lxc-0
        series: trusty
    dns-name: huffman-vm-03.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-ju...

Cheryl, yes, all the containers. Here's another example with entire status:

environment: ci-oil-slave9
machines:
  '0':
    agent-state: started
    agent-version: 1.25.5
    dns-name: huffman-vm-06.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-juno,blacklist-icehouse,blacklist-precise,blacklist-compute,vm-ppc64el
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-617fecf2-a0e5-11e4-9cac-00163eca07b6/
    series: trusty
    state-server-member-status: has-vote
  '1':
    agent-state: started
    agent-version: 1.25.5
    containers:
      1/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.46.22
        hardware: arch=amd64
        instance-id: juju-machine-1-lxc-0
        series: trusty
      1/lxc/1:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.47.165
        hardware: arch=amd64
        instance-id: juju-machine-1-lxc-1
        series: trusty
    dns-name: ekans.oil
    hardware: arch=amd64 cpu-cores=32 mem=65536M tags=hw-production,hardware-dell-poweredge-R720XD
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-1490c87a-c615-11e5-b7b2-00163eca07b6/
    series: trusty
  '2':
    agent-state: started
    agent-version: 1.25.5
    containers:
      2/lxc/0:
        agent-state: error
        agent-state-info: 'failed to retrieve the template to clone: cannot determine
          cached image URL: cannot determine LXC image URL: cannot determine LXC image
          URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

: exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt

: exit status 1'
        instance-id: pending
        series: trusty
    dns-name: hayward-13.oil
    hardware: arch=amd64 cpu-cores=8 mem=16384M tags=hardware-sm15k,hw-glance-sm15k,hw-production
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-9e308d0e-c4cd-11e3-8102-00163efc5068/
    series: trusty
  '3':
    agent-state: started
    agent-version: 1.25.5
    containers:
      3/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.43.102
        hardware: arch=ppc64el
        instance-id: juju-machine-3-lxc-0
        series: trusty
    dns-name: huffman-vm-01.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-juno,blacklist-icehouse,blacklist-precise,blacklist-compute,vm-ppc64el
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-5c8f2d56-9c45-11e4-bf57-00163eca07b6/
    series: trusty
  '4':
    agent-state: started
    agent-version: 1.25.5
    containers:
      4/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.33.145
        hardware: arch=ppc64el
        instance-id: juju-machine-4-lxc-0
        series: trusty
    dns-name: huffman-vm-03.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-juno,blacklist-icehouse,blacklist-precise,blacklist-compute,vm-ppc64el
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-ea1b420e-9ccc-11e4-bf57-00163eca07b6/
    series: trusty
  '5':
    agent-state: started
    agent-version: 1.25.5
    containers:
      5/lxc/0:
        agent-state: started
        agent-version: 1.25.5
        dns-name: 10.245.43.96
        hardware: arch=ppc64el
        instance-id: juju-machine-5-lxc-0
        series: trusty
    dns-name: huffman-vm-05.oil
    hardware: arch=ppc64el cpu-cores=4 mem=8229M tags=hardware-ibm-power8-S822L,hw-production,blacklist-juno,blacklist-icehouse,blacklist-precise,blacklist-compute,vm-ppc64el
      availability-zone=Production
    instance-id: /MAAS/api/1.0/nodes/node-9a75d4fa-a0e4-11e4-9cac-00163eca07b6/
    series: trusty
networks:
  subnet-1:
    cidr: 10.245.0.0/18
    provider-id: subnet-1
services:
  ceph:
    charm: local:trusty/ceph-105
    exposed: false
    relations:
      mon:
      - ceph
    service-status:
      current: active
      message: Unit is ready and clustered
      since: 23 Jun 2016 03:51:56Z
    units:
      ceph/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:51:58Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: '1'
        public-address: ekans.oil
        workload-status:
          current: active
          message: Unit is ready and clustered
          since: 23 Jun 2016 03:51:56Z
  cinder:
    charm: local:trusty/cinder-136
    exposed: false
    relations:
      cluster:
      - cinder
    service-status:
      current: blocked
      message: 'Missing relations: messaging, identity, database'
      since: 23 Jun 2016 03:49:59Z
    units:
      cinder/0:
        agent-state: started
        agent-status:
          current: executing
          message: running update-status hook
          since: 23 Jun 2016 03:54:59Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: '4'
        open-ports:
        - 8776/tcp
        public-address: huffman-vm-03.oil
        workload-status:
          current: blocked
          message: 'Missing relations: messaging, identity, database'
          since: 23 Jun 2016 03:49:59Z
  glance:
    charm: local:trusty/glance-150
    exposed: false
    relations:
      cluster:
      - glance
    service-status:
      current: blocked
      message: 'Missing relations: identity, database'
      since: 23 Jun 2016 03:52:26Z
    units:
      glance/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:52:28Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: 1/lxc/1
        open-ports:
        - 9292/tcp
        public-address: 10.245.47.165
        workload-status:
          current: blocked
          message: 'Missing relations: identity, database'
          since: 23 Jun 2016 03:52:26Z
  keystone:
    charm: local:trusty/keystone-0
    exposed: false
    relations:
      cluster:
      - keystone
    service-status:
      current: unknown
      message: Waiting for agent initialization to finish
      since: 23 Jun 2016 03:00:31Z
    units:
      keystone/0:
        agent-state: pending
        agent-status:
          current: allocating
          since: 23 Jun 2016 03:00:31Z
        machine: 2/lxc/0
        workload-status:
          current: unknown
          message: Waiting for agent initialization to finish
          since: 23 Jun 2016 03:00:31Z
  mysql:
    charm: local:trusty/mysql-326
    exposed: false
    relations:
      cluster:
      - mysql
    service-status:
      current: unknown
      since: 23 Jun 2016 03:15:58Z
    units:
      mysql/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:51:02Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: 4/lxc/0
        open-ports:
        - 3306/tcp
        public-address: 10.245.33.145
        workload-status:
          current: unknown
          since: 23 Jun 2016 03:15:58Z
  neutron-api:
    charm: local:trusty/neutron-api-0
    exposed: false
    relations:
      cluster:
      - neutron-api
    service-status:
      current: blocked
      message: 'Missing relations: messaging, identity, database'
      since: 23 Jun 2016 03:51:57Z
    units:
      neutron-api/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:52:00Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: 3/lxc/0
        open-ports:
        - 9696/tcp
        public-address: 10.245.43.102
        workload-status:
          current: blocked
          message: 'Missing relations: messaging, identity, database'
          since: 23 Jun 2016 03:51:57Z
  neutron-gateway:
    charm: local:trusty/neutron-gateway-64
    exposed: false
    relations:
      cluster:
      - neutron-gateway
    service-status:
      current: blocked
      message: 'Missing relations: messaging'
      since: 23 Jun 2016 03:50:00Z
    units:
      neutron-gateway/0:
        agent-state: started
        agent-status:
          current: executing
          message: running update-status hook
          since: 23 Jun 2016 03:55:01Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: '3'
        public-address: huffman-vm-01.oil
        workload-status:
          current: blocked
          message: 'Missing relations: messaging'
          since: 23 Jun 2016 03:50:00Z
  neutron-openvswitch:
    charm: local:trusty/neutron-openvswitch-0
    exposed: false
    service-status: {}
  nova-cloud-controller:
    charm: local:trusty/nova-cloud-controller-501
    exposed: false
    relations:
      cluster:
      - nova-cloud-controller
    service-status:
      current: blocked
      message: 'Missing relations: messaging, image, compute, identity, database'
      since: 23 Jun 2016 03:52:53Z
    units:
      nova-cloud-controller/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:52:56Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: '5'
        open-ports:
        - 8774/tcp
        public-address: huffman-vm-05.oil
        workload-status:
          current: blocked
          message: 'Missing relations: messaging, image, compute, identity, database'
          since: 23 Jun 2016 03:52:53Z
  nova-compute:
    charm: local:trusty/nova-compute-133
    exposed: false
    relations:
      compute-peer:
      - nova-compute
    service-status:
      current: blocked
      message: 'Missing relations: messaging, image'
      since: 23 Jun 2016 03:51:51Z
    units:
      nova-compute/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:51:53Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: '1'
        public-address: ekans.oil
        workload-status:
          current: blocked
          message: 'Missing relations: messaging, image'
          since: 23 Jun 2016 03:51:51Z
      nova-compute/1:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:51:16Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: '2'
        public-address: hayward-13.oil
        workload-status:
          current: blocked
          message: 'Missing relations: messaging, image'
          since: 23 Jun 2016 03:51:14Z
  openstack-dashboard:
    charm: local:trusty/openstack-dashboard-32
    exposed: false
    relations:
      cluster:
      - openstack-dashboard
    service-status:
      current: blocked
      message: 'Missing relations: identity'
      since: 23 Jun 2016 03:51:16Z
    units:
      openstack-dashboard/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:51:18Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: 5/lxc/0
        open-ports:
        - 80/tcp
        - 443/tcp
        public-address: 10.245.43.96
        workload-status:
          current: blocked
          message: 'Missing relations: identity'
          since: 23 Jun 2016 03:51:16Z
  rabbitmq-server:
    charm: local:trusty/rabbitmq-server-150
    exposed: false
    relations:
      cluster:
      - rabbitmq-server
    service-status:
      current: active
      message: Unit is ready
      since: 23 Jun 2016 03:52:15Z
    units:
      rabbitmq-server/0:
        agent-state: started
        agent-status:
          current: idle
          since: 23 Jun 2016 03:52:17Z
          version: 1.25.5
        agent-version: 1.25.5
        machine: 1/lxc/0
        open-ports:
        - 5672/tcp
        public-address: 10.245.46.22
        workload-status:
          current: active
          message: Unit is ready
          since: 23 Jun 2016 03:52:15Z

Revision history for this message

Larry Michel (lmic) wrote on 2016-06-23:

#16

Previous example was only a single container.. I was looking for one like machine 5, but only saw systems with single container. Here's entire status file for that earlier example: https://pastebin.canonical.com/159353/

Curtis Hovey (sinzui) on 2016-06-24

Changed in juju-core:
milestone:	2.0-beta10 → 2.0-beta11

Revision history for this message

Larry Michel (lmic) wrote on 2016-06-30:

#17

In our case, the issues completely went away after we reloaded an SM15k chassis. We also found a duplicate IP that was due to a maas bug but I don't think it was the primary issue.

Curtis Hovey (sinzui) on 2016-07-01

Changed in juju-core:
milestone:	2.0-beta11 → 2.0-beta12

Cheryl Jennings (cherylj) on 2016-07-05

Changed in juju-core:
milestone:	2.0-beta12 → none

Anastasia (anastasia-macmood) on 2016-07-25

Changed in juju-core:
status:	Incomplete → Invalid

Revision history for this message

Florian Haas (fghaas) wrote on 2016-12-01:

#18

Download full text (5.1 KiB)

This bug is definitely not invalid; it still exists in 1.25.8-trusty-amd64 and is an extraordinary pain.

As far as i can tell, what's happening here is that the target machine attempts to download the image, but the image cache on the bootstrap node never gets populated:

2016-12-01 09:25:46 DEBUG juju.container image.go:89 lxc image for trusty (amd64) is https://cloud-images.ubuntu.com/server/releases/trusty/release-20161109/ubuntu-14.04-server-cloudimg-amd64.tar.gz
2016-12-01 09:25:46 DEBUG juju.container.lxc lxc.go:403 creating lxc container "juju-trusty-lxc-template"
2016-12-01 09:25:46 DEBUG juju.container.lxc lxc.go:404 lxc-create template params: [--debug --userdata /var/lib/juju/containers/juju-trusty-lxc-template/cloud-init --hostid juju-trusty-lxc-template -r trusty -T https://192.168.122.100:17070/environment/23487fc9-6e44-49a0-8ddf-8472ede8a1ee/images/lxc/trusty/amd64/ubuntu-14.04-server-cloudimg-amd64-root.tar.gz]
2016-12-01 09:25:46 ERROR juju.container.lxc clonetemplate.go:129 lxc template container creation failed: lxc container creation failed: juju-trusty-lxc-template

Note that there is no gap in there at all; the lxc-create call immediately fails. This is a new machine; if the whole template tarball were downloaded it would at least take several seconds.

Trying to force container template re-creation on the target node ("rm /var/lib/juju/containers/juju-trusty-lxc-template -rf") doesn't do a thing either. After doing so and running "juju deploy <service> --to lxc:1", this is the entirety of the contents of that directory:

$ ls /var/lib/juju/containers/juju-trusty-lxc-template -la
total 16
drwxr-xr-x 2 root root 4096 Dec 1 09:46 .
drwxr-xr-x 4 root root 4096 Dec 1 09:46 ..
-rw-r--r-- 1 root root 1653 Dec 1 09:47 cloud-init
-rw-r--r-- 1 root root 111 Dec 1 09:47 lxc.conf

In other words, that template just never gets downloaded.

Now, digging in a bit deeper and running the lxc-create command manually:

sudo lxc-create -P /var/lib/juju/containers -n test -t ubuntu-cloud -- --debug --userdata /var/lib/juju/containers/juju-trusty-lxc-template/cloud-init --hostid juju-trusty-lxc-template -r trusty -T https://192.168.122.100:17070/environment/23487fc9-6e44-49a0-8ddf-8472ede8a1ee/images/lxc/trusty/amd64/ubuntu-14.04-server-cloudimg-amd64-root.tar.gz
+ '[' amd64 = i686 ']'
+ '[' 0 = 0 ']'
+ case "$hostarch:$arch" in
+ :
+ '[' tryreleased '!=' daily -a tryreleased '!=' released -a tryreleased '!=' tryreleased ']'
+ '[' -z /var/lib/juju/containers/test ']'
++ id -u
+ '[' 0 '!=' 0 ']'
+ config=/var/lib/juju/containers/test/config
+ '[' -z /var/lib/juju/containers/test/rootfs ']'
+ type ubuntu-cloudimg-query
ubuntu-cloudimg-query is /usr/bin/ubuntu-cloudimg-query
+ type wget
wget is /usr/bin/wget
+ cache=/var/cache/lxc/cloud-trusty
+ '[' 0 -eq 1 ']'
+ mkdir -p /var/cache/lxc/cloud-trusty
+ '[' tryreleased = tryreleased ']'
+ stream=released
+ ubuntu-cloudimg-query trusty released amd64
+ '[' -n https://192.168.122.100:17070/environment/23487fc9-6e44-49a0-8ddf-8472ede8a1ee/images/lxc/trusty/amd64/ubuntu-14.04-server-cloudimg-amd64-root.tar.gz ']'
+ url2=https://192.168.122.100:17070/environment/23487fc9-6e44-49a0-8ddf-8472ede8a...

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.0.0
milestone:	2.0.0 → 2.0-beta8

Changed in juju-core:
status:	Invalid → Triaged
assignee:	nobody → Richard Harding (rharding)
milestone:	none → 1.25.10

Changed in juju-core:
importance:	Critical → Undecided
status:	Triaged → Won't Fix

juju-core

Recurring lxc issue: failed to retrieve the template to clone

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Changed in juju-core:
assignee:	Richard Harding (rharding) → nobody