AllWatcher does not report agents that are down

Bug #1788638 reported by Roger Peppe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

The JIMM service relies on the AllModelWatcher to observe changes in status of the model. One of the things it needs to know is when machines and units are down. Unfortunately the allwatcher does not provide this information, even though it's reported on a full status.

For example, here is the reply to a FullStatus API call where the machine is down:

 {
  request-id: 2
  response: {
   applications: {
    ubuntu: {
     can-upgrade-to: ""
     charm: "cs:ubuntu-12"
     endpoint-bindings: {}
     exposed: false
     life: ""
     meter-statuses: null
     public-address: ""
     relations: {}
     series: "xenial"
     status: {
      data: {}
      info: "ready"
      kind: ""
      life: ""
      since: "2018-08-23T14:16:31.26037557Z"
      status: "active"
      version: ""
     }
     subordinate-to: null
     units: {
      "ubuntu/2": {
       agent-status: {
        data: {}
        info: ""
        kind: ""
        life: ""
        since: "2018-08-23T14:16:32.792690804Z"
        status: "idle"
        version: "2.4.1"
       }
       charm: ""
       leader: true
       machine: "2"
       opened-ports: null
       public-address: "52.23.229.253"
       subordinates: null
       workload-status: {
        data: {}
        info: "ready"
        kind: ""
        life: ""
        since: "2018-08-23T14:16:31.26037557Z"
        status: "active"
        version: ""
       }
       workload-version: "16.04"
      }
     }
     workload-version: "16.04"
    }
   }
   controller-timestamp: "2018-08-23T14:46:40.334238019Z"
   machines: {
    "2": {
     agent-status: {
      data: {}
      info: "agent is not communicating with the server"
      kind: ""
      life: ""
      since: "2018-08-23T14:15:18.246038417Z"
      status: "down"
      version: "2.4.1"
     }
     constraints: "cores=1 mem=1024M"
     containers: {}
     dns-name: "52.23.229.253"
     hardware: "arch=amd64 cores=1 cpu-power=350 mem=3840M root-disk=8192M availability-zone=us-east-1b"
     has-vote: false
     id: "2"
     instance-id: "i-0f4045bcef2e06c3d"
     instance-status: {
      data: {}
      info: "running"
      kind: ""
      life: ""
      since: "2018-08-23T14:13:24.377632834Z"
      status: "running"
      version: ""
     }
     ip-addresses: [
      "52.23.229.253"
      "172.31.39.86"
      "252.39.86.1"
     ]
     jobs: [
      "JobHostUnits"
     ]
     network-interfaces: {
      eth0: {
       gateway: "172.31.32.1"
       ip-addresses: [
        "172.31.39.86"
       ]
       is-up: true
       mac-address: "12:ae:04:27:c9:3a"
      }
      fan-252: {
       ip-addresses: [
        "252.39.86.1"
       ]
       is-up: true
       mac-address: "32:86:99:8b:b8:a8"
      }
     }
     series: "xenial"
     wants-vote: false
    }
   }
   model: {
    available-version: ""
    cloud-tag: "cloud-aws"
    meter-status: {
     color: ""
     message: ""
    }
    model-status: {
     data: {}
     info: ""
     kind: ""
     life: ""
     since: "2018-08-23T14:04:02.911112932Z"
     status: "available"
     version: ""
    }
    name: "jimmmodel"
    region: "us-east-1"
    sla: "unsupported"
    type: "iaas"
    version: "2.4.1"
   }
   offers: {}
   relations: null
   remote-applications: {}
  }
 }

Note that the machine-2 status is reported as "down".

By contrast, here's the machine entry as reported by WatchAll.
The agent status is reported as "started", not "down".

 machine 2 04c4615d-fcab-4b3b-8c91-1a201cf50222 {
  model-uuid: "04c4615d-fcab-4b3b-8c91-1a201cf50222"
  id: "2"
  instance-id: "i-0f4045bcef2e06c3d"
  agent-status: {
   current: "started"
   message: ""
   since: "2018-08-23T14:15:18.246038417Z"
   version: "2.4.1"
  }
  instance-status: {
   current: "running"
   message: "running"
   since: "2018-08-23T14:13:24.377632834Z"
   version: ""
  }
  life: "alive"
  series: "xenial"
  supported-containers: [
   "lxd"
  ]
  supported-containers-known: true
  hardware-characteristics: {
   arch: "amd64"
   mem: 3840
   root-disk: 8192
   cpu-cores: 1
   cpu-power: 350
   availability-zone: "us-east-1b"
  }
  jobs: [
   "JobHostUnits"
  ]

Roger Peppe (rogpeppe)
summary: - AllWatcher does not report down machines
+ AllWatcher does not report agents that are down
Changed in juju:
status: New → Triaged
milestone: none → 2.4.3
importance: Undecided → High
Revision history for this message
Roger Peppe (rogpeppe) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.