lxd containers fail to find agent binaries for arch. mixed architecture controller & machines

Bug #1753955 reported by Sean Feole
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Juju Version: 2.3.3-xenial-amd64

Problem: We are testing ppc64el bare metal hosts in our lab. We do not have sufficient available hosts to bootstrap the controller on ppc64el so we are therefore using an amd64 machine.

I am deploying via a bundle, in which i added constraints to the machines, so that it would use our ppc64el hosts in MAAS.

machines:
  '0':
    series: xenial
    constraints: arch=ppc64el
  '1':
    series: xenial
    constraints: arch=ppc64el
  '2':
    series: xenial
    constraints: arch=ppc64el
  '3':
    series: xenial
    constraints: arch=ppc64el

Once deployed the bare metal ppc64el hosts booted up, but the LXD containers failed to start, apparently they could not find the agent binaries.

Machine State DNS Inst id Series AZ Message
0 started 10.245.168.40 gxy47x xenial default Deployed
0/lxd/0 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
0/lxd/1 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
0/lxd/2 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
1 started 10.245.168.45 xpwspa xenial default Deployed
1/lxd/0 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
1/lxd/1 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
1/lxd/2 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
1/lxd/3 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
1/lxd/4 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
2 started 10.245.168.56 877gbf xenial default Deployed
2/lxd/0 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
2/lxd/1 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
2/lxd/2 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
3 started 10.245.168.47 fa6tef xenial default Deployed
3/lxd/0 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
3/lxd/1 down pending xenial need agent binaries for arch ppc64el, only found [amd64]
3/lxd/2 down pending xenial need agent binaries for arch ppc64el, only found [amd64]

After some triage, we have discovered that if the constraints are specified explicitly, then the container does start.

juju add-machine lxd:0 --constraints arch=ppc64el

0/lxd/4 started 10.245.168.25 juju-c89fe4-0-lxd-4 xenial default Container started

According to the juju documentation, specifying the constraints in the charm will work:
https://jujucharms.com/docs/2.0/charms-bundles#setting-constraints-in-a-bundle

The bundle was modified to reflect these directions, for example:

  openstack-dashboard:
    annotations:
      gui-x: '500'
      gui-y: '-250'
    charm: cs:~openstack-charmers-next/xenial/openstack-dashboard
    constraints: "arch=ppc64el"
    num_units: 1
    options:
      openstack-origin: cloud:xenial-queens/proposed
    to:
    - lxd:3

Now this appeared to fix the issue once redeployed. Please note below.

Machine State DNS Inst id Series AZ Message
0 started 10.245.168.40 gxy47x xenial default Deployed
0/lxd/7 started 10.245.168.41 juju-c89fe4-0-lxd-7 xenial default Container started
0/lxd/8 started 10.245.168.44 juju-c89fe4-0-lxd-8 xenial default Container started
0/lxd/9 started 10.245.168.52 juju-c89fe4-0-lxd-9 xenial default Container started
1 started 10.245.168.45 xpwspa xenial default Deployed
1/lxd/5 started 10.245.168.46 juju-c89fe4-1-lxd-5 xenial default Container started
1/lxd/6 started 10.245.168.43 juju-c89fe4-1-lxd-6 xenial default Container started
1/lxd/7 started 10.245.168.51 juju-c89fe4-1-lxd-7 xenial default Container started
2 started 10.245.168.56 877gbf xenial default Deployed
2/lxd/3 started 10.245.168.39 juju-c89fe4-2-lxd-3 xenial default Container started
2/lxd/4 started 10.245.168.48 juju-c89fe4-2-lxd-4 xenial default Container started
2/lxd/5 pending juju-c89fe4-2-lxd-5 xenial default Container started
3 started 10.245.168.47 fa6tef xenial default Deployed
3/lxd/3 started 10.245.168.37 juju-c89fe4-3-lxd-3 xenial default Container started
3/lxd/4 started 10.245.168.49 juju-c89fe4-3-lxd-4 xenial default Container started
3/lxd/5 pending juju-c89fe4-3-lxd-5 xenial default Container started

Even though the final solution was a success, the lxd containers should of automatically inherited this constraint due to the fact the machines in the bundle were explicitly told to use arch=ppc64l

You should not have to add constraints to every charm in the bundle.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1753955] [NEW] lxd containers fail to find agent binaries for arch. mixed architecture controller & machines
Download full text (6.3 KiB)

I have the feeling we might be displaying an error that is confusing the
issue. If I was to guess, I would say we're actually trying to create amd64
instances on a ppc machine. So it isn't that 'we can't find tools' (a red
herring) but that we can't launch the instance.

I could be completely wrong, but the fact that by specifying what ARCH you
want, makes us "find the tools" makes me think the issue is something else.

On Wed, Mar 7, 2018 at 10:12 AM, Sean Feole <email address hidden>
wrote:

> Public bug reported:
>
> Juju Version: 2.3.3-xenial-amd64
>
> Problem: We are testing ppc64el bare metal hosts in our lab. We do not
> have sufficient available hosts to bootstrap the controller on ppc64el
> so we are therefore using an amd64 machine.
>
> I am deploying via a bundle, in which i added constraints to the
> machines, so that it would use our ppc64el hosts in MAAS.
>
> machines:
> '0':
> series: xenial
> constraints: arch=ppc64el
> '1':
> series: xenial
> constraints: arch=ppc64el
> '2':
> series: xenial
> constraints: arch=ppc64el
> '3':
> series: xenial
> constraints: arch=ppc64el
>
> Once deployed the bare metal ppc64el hosts booted up, but the LXD
> containers failed to start, apparently they could not find the agent
> binaries.
>
>
> Machine State DNS Inst id Series AZ Message
> 0 started 10.245.168.40 gxy47x xenial default Deployed
> 0/lxd/0 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 0/lxd/1 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 0/lxd/2 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 1 started 10.245.168.45 xpwspa xenial default Deployed
> 1/lxd/0 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 1/lxd/1 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 1/lxd/2 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 1/lxd/3 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 1/lxd/4 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 2 started 10.245.168.56 877gbf xenial default Deployed
> 2/lxd/0 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 2/lxd/1 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 2/lxd/2 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 3 started 10.245.168.47 fa6tef xenial default Deployed
> 3/lxd/0 down pending xenial need agent
> binaries for arch ppc64el, only found [amd64]
> 3/lxd/1 down pending xenial need agent
> binaries for arch ppc64el, only...

Read more...

Revision history for this message
Sean Feole (sfeole) wrote :

Here is a juju crashdump if that helps any

Revision history for this message
John A Meinel (jameinel) wrote :

Just to clarify, if you specify the Arch for the containers, then things are launched and provisioned as expected. But you'd like to have Juju assume the arch for containers from the arch for the host machines.

Changed in juju:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.