juju not detecting azure provisioning failure

Bug #1628246 reported by Kevin W Monroe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Andrew Wilkins

Bug Description

I bootstrapped azure/eastus with juju version:

2.0-rc1-xenial-amd64

I deployed the following 12 unit bundle:

https://jujucharms.com/canonical-kubernetes/

One of the machines had a provisioning error, yet juju-status stayed in 'pending':

$ juju status --format yaml 1
model:
  name: tima
  controller: azure-e
  cloud: azure
  region: eastus
  version: 2.0-rc1
machines:
  "1":
    juju-status:
      current: pending
      since: 27 Sep 2016 16:18:10Z
    instance-id: machine-1
    machine-status:
      current: provisioning error
      message: Failed
      since: 27 Sep 2016 16:19:02Z
    series: trusty
    hardware: arch=amd64 cores=1 mem=1792M root-disk=30720M
applications:
  elasticsearch:
    charm: cs:trusty/elasticsearch-18
    series: trusty
    os: ubuntu
    charm-origin: jujucharms
    charm-name: elasticsearch
    charm-rev: 18
    exposed: false
    application-status:
      current: waiting
      message: waiting for machine
      since: 27 Sep 2016 16:18:09Z
    relations:
      client:
      - filebeat
      - kibana
      - topbeat
      peer:
      - elasticsearch
    units:
      elasticsearch/0:
        workload-status:
          current: waiting
          message: waiting for machine
          since: 27 Sep 2016 16:18:09Z
        juju-status:
          current: allocating
          since: 27 Sep 2016 16:18:09Z
        machine: "1"

You can't retry provisioning on a machine in 'pending':

$ juju retry-provisioning 1
machine 1 is not in an error state

This bug is to figure out why juju did not detect the provisioning failure and move the machine to an 'error' state. See attached images for details from the Azure portal.

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Controller logs incoming:

- logsink.log
- machine-0.log

Not sure how helpful these will be.. machine-1 is the machine in question, and I don't see any mention of that one in the logs. Let me know if there's any other info you need.

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :
Changed in juju:
status: New → Triaged
assignee: nobody → Andrew Wilkins (axwalk)
importance: Undecided → High
milestone: none → 2.0.0
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

The duped bug and associated PR seem specific to MAAS. Will the same fix address Azure?

Revision history for this message
Andrew Wilkins (axwalk) wrote :

@kwmonroe: the important part of the PR is in apiserver/instancepoller and apiserver/provisioner. That's provider-independent, and will address the issue for Azure as well as MAAS.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.