Juju2: 'Creating container: failed to ensure LXD image: image not imported!'

Bug #1650304 reported by Larry Michel
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Critical
Unassigned
2.1
Expired
Low
Unassigned

Bug Description

We're seeing on our staging environment:

  "1":
    juju-status:
      current: started
      since: 15 Dec 2016 11:35:42Z
      version: 2.1-beta2
    dns-name: 10.245.16.191
    ip-addresses:
    - 10.245.16.191
    instance-id: 4y3khr
    machine-status:
      current: running
      message: Deployed
      since: 15 Dec 2016 11:31:46Z
    series: xenial
    containers:
      1/lxd/0:
        juju-status:
          current: down
          message: agent is not communicating with the server
          since: 15 Dec 2016 11:37:12Z
        instance-id: pending
        machine-status:
          current: provisioning error
          message: 'Creating container: failed to ensure LXD image: image not imported!'
          since: 15 Dec 2016 11:37:12Z
        series: xenial
      1/lxd/1:
        juju-status:
          current: down
          message: agent is not communicating with the server
          since: 15 Dec 2016 11:37:59Z
        instance-id: pending
        machine-status:
          current: provisioning error
          message: 'Creating container: failed to ensure LXD image: image not imported!'
          since: 15 Dec 2016 11:37:58Z
        series: xenial
    hardware: arch=amd64 cores=8 mem=32768M tags=hardware-hp-proliant-DL320E,anahuac,hw-alai-staging,hw-staging-xenial
      availability-zone=default

But, it's always happening on the same server though and what's particular about this server is that it is switched to PXE boot from eth1 rather than eth0 which has the primary IP and was the PXE NIC during commissioning. The server deployed OK in MAAS though and it seems like it's reachable until LXD networking is configured.

Also seen in CI:
http://reports.vapour.ws/releases/issue/58929258749a5607ec1b7aa4

Revision history for this message
Larry Michel (lmic) wrote :
Revision history for this message
Larry Michel (lmic) wrote :

This is what the NIC config looks like in maas (attached screenshot)

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.1.0
Revision history for this message
John A Meinel (jameinel) wrote :

Maybe I did something wrong but that .tar file is only 10kB in size and doesn't appear to contain any data. (there is an odd string which is 'Select a file with cursor and press ENTER'. in the tar file)

Having more details like DEBUG level logs for the host machine and possibly for the controller machine might be helpful. There may be some reason we are failing to contact cloud-images.ubuntu.com to get an image for the container.

Changed in juju:
status: Triaged → Incomplete
milestone: 2.1.0 → none
Aaron Bentley (abentley)
description: updated
Changed in juju:
status: Incomplete → Triaged
Aaron Bentley (abentley)
Changed in juju:
importance: High → Critical
tags: added: regression
Changed in juju:
milestone: none → 2.1-rc1
Changed in juju:
milestone: 2.1-rc1 → 2.1.0
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.1.0 → 2.2.0-alpha1
Revision history for this message
Andrew Wilkins (axwalk) wrote :

This is happening on RackSpace in CI. There appears to be a problem with bridging still.

The machines each have two NICs: one should have a public (floating) IP address, and the other should have a private 10.x.y.z address. On the controller machine (which is fine, since it has no containers on it), this is the case.

I created a machine with a LXD container, and the host ends up with both NICs bridged: br-eth0 ends up with IPv6 addresses (only), and br-eth1 ends up with the 10-dot address. So the machine agent can still talk to the controller, but it cannot route to the Internet. That's why the container fails to start, because the cloud-images repository cannot be reached.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
Revision history for this message
Andrew Wilkins (axwalk) wrote :

It appears to be related to the fact that there's both IPv4 and IPv6 available on the machine. /etc/network/interfaces contains stanzas for both inet and inet6 for br-eth0. If I comment out all of the inet6 stanzas, the bridge gets an IPv4 address.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I'm no expert on debian bridging, but empirically it appears that we should not be specifying bridge_ports in both the inet and inet6 br- stanzas. Just specify in one (inet, say), and not in the other. At least, that worked for me.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

The add-juju-bridge script had code for handling updating existing bridges, but failed to cater for adding multiple iface stanzas for a bridge at once. I'm testing the fix on RackSpace now.

I guess this is not the same issue affecting the initially reported MAAS deployment, since there's no IPv6 there AFAICT.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Larry, can you please provide the original /etc/network/interfaces file from the MAAS machine? i.e. before Juju touches it. Simplest way to do that would (I think) be to start Ubuntu on the machine without Juju. Alternatively have Juju deploy to it, but don't put any containers on it; then the bridging script won't run and modify the file.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Andrew Wilkins (axwalk) wrote :

This PR makes Rackspace happy: https://github.com/juju/juju/pull/6962.

I'll need to see the /etc/network/interfaces file from MAAS to be sure, but at a guess the eth0 interface may have stanzas for both inet and bootp. If that's the case, the same PR may fix it.

Revision history for this message
Larry Michel (lmic) wrote :

Andrew, that machine is no longer in that state. But I think it's reproducible by having system boot from MAAS non-designated PXE NIC eth1 as shown in . In that case, eth0 (NIC that's set to auto-assign) should still get a static IP but perhaps something changes in that case. I'll see what I can do to get another system to PXE boot from eth1 and collect interfaces files for both cases.

Changed in juju:
milestone: 2.2.0-alpha1 → none
Revision history for this message
Larry Michel (lmic) wrote :

I am trying to recreate by doing a simple rename through maas. So, the scenario would be to simply have eth1 as the boot device that's set to auto-assign and leave eth0 as unconfigured. Then, I could deploy ubuntu to 0, and mysql to lxd:0. I am working on this test and will update with result.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju 2.1 because there has been no activity for 60 days.]

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.