juju add-machine lxd:N --constraints INVALID does not show provisioning error

Bug #1650252 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
John A Meinel

Bug Description

If we fail to provision an LXD container (for any reason), that message should show up as part of the LXD container status. Right now we just show 'pending'

In the log I see:
2016-12-15 11:28:20 ERROR juju.provisioner provisioner_task.go:687 fetching provisioning info for machine "0/lxd/1": cannot match subnets to zones: space "space-1" not found

But in 'juju status --format=yaml' we just see:
    containers:
      0/lxd/1:
        juju-status:
          current: down
          message: agent is not communicating with the server
          since: 15 Dec 2016 15:28:20+04:00
        instance-id: pending
        machine-status:
          current: pending
          since: 15 Dec 2016 15:28:17+04:00
        series: xenial

1) 'agent is not communicating with the server' is bogus, the machine doesn't exist yet, and should be reported as such
2) 'pending' but really it is in a failed state, we need to report that back up the stack.

Revision history for this message
John A Meinel (jameinel) wrote :

I did eventually see:
      0/lxd/1:
        juju-status:
          current: error
          message: 'cannot match subnets to zones: space "space-1" not found'
          since: 15 Dec 2016 15:28:20+04:00
          life: dead
        instance-id: pending
        machine-status:
          current: pending
          since: 15 Dec 2016 15:28:17+04:00
        series: xenial

So maybe we do report it, but only after trying a few times?
bug #1650253 is that we should be validating the constraints much earlier than after actually trying to start the instance.

Revision history for this message
John A Meinel (jameinel) wrote :

I was able to sort it out a bit.

1) We are setting the *agent* status, vs the *machine* status. Which is why these messages show up as they do. We should probably be calling Machine.SetInstanceStatus() when provisioning finally fails instead of just Machine.SetStatus()
That also means that to find the history you can't do "juju show-status-logs --type machine 1/lxd/1" you have to do "juju show-status-logs --type juju-machine 1/lxd/1".

2) By default, the Provisioner code sets the instance with a message about the error, and then immediately follows it up with "will retry in 10s". Which hides the actual error about what is currently going on.

3) Also by default, when presenting "juju status" to the user, if the machine is not in Pending or Stopped, it checks to see if the agent is alive (according to the Presence logic), and the overrides any message with "agent is not communicating..." which hides the failure message.

I think all of this gets more understandable if we call machine.SetInstanceStatus(). At the very least when we finally give up on provisioning. Possibly earlier.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

PR against feature branch: https://github.com/juju/juju/pull/6828

Changed in juju:
status: Triaged → In Progress
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote :

Fix is in the 2.1-dynamic-bridges branch.

Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
milestone: none → 2.1.0
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.