no visible sign that HA is degraded when lost

Bug #1602749 reported by Richard Harding on 2016-07-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Unassigned

Bug Description

I bootstrapped on lxd and ran enable-ha. I then was presented with three state servers and each show controller-member-status: has-vote

I then used lxc delete to remove the controller #1 (leaving #0 and #2) and there was no indication that there was any sort of degradation or failure in show-controller, juju status, juju status --format=yaml. I could not tell any way to interrogate and see that the #1 was gone. I tried to ping each controller IP address and found that #1 was not responding.

Current output:

juju status --format=yaml
model:
  name: controller
  controller: uxtest
  cloud: lxd
  version: 2.0-beta11
machines:
  "0":
    juju-status:
      current: started
      since: 13 Jul 2016 11:38:08-04:00
      version: 2.0-beta11
    dns-name: 10.90.136.71
    instance-id: juju-8c982a-0
    machine-status:
      current: running
      message: Running
      since: 13 Jul 2016 11:22:34-04:00
    series: xenial
    hardware: arch=amd64 cpu-cores=0 mem=0M
    controller-member-status: has-vote
  "1":
    juju-status:
      current: started
      since: 13 Jul 2016 11:38:08-04:00
      version: 2.0-beta11
    dns-name: 10.90.136.92
    instance-id: juju-8c982a-1
    machine-status:
      current: running
      message: Running
      since: 13 Jul 2016 11:37:36-04:00
    series: xenial
    hardware: arch=amd64 cpu-cores=0 mem=0M
    controller-member-status: has-vote
  "2":
    juju-status:
      current: started
      since: 13 Jul 2016 11:38:08-04:00
      version: 2.0-beta11
    dns-name: 10.90.136.249
    instance-id: juju-8c982a-2
    machine-status:
      current: running
      message: Running
      since: 13 Jul 2016 11:37:20-04:00
    series: xenial
    hardware: arch=amd64 cpu-cores=0 mem=0M
    controller-member-status: has-vote
applications: {}

Changed in juju-core:
importance: Undecided → High
Curtis Hovey (sinzui) on 2016-07-27
Changed in juju-core:
status: Confirmed → Triaged
Changed in juju-core:
milestone: none → 2.0.0
Changed in juju-core:
milestone: 2.0.0 → 2.0.1
affects: juju-core → juju
Changed in juju:
milestone: 2.0.1 → none
milestone: none → 2.0.1
Curtis Hovey (sinzui) on 2016-10-28
Changed in juju:
milestone: 2.0.1 → none
John A Meinel (jameinel) wrote :

related to bug #1766576

Nobuto Murata (nobuto) wrote :

This is reproducible still with Juju 2.5.1 and maas provider with maas 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1.

I simulated a dead Juju controller machine by suspending a MAAS Pod VM responsible for one of the controller, but Juju reports all controller nodes are healthy.

$ virsh suspend juju-1 ## simulate an unresponsive controller

$ juju ssh -m controller 0 ## 0 = juju-1
ERROR cannot connect to any address: [192.168.151.22:22 192.168.151.22:22]

$ juju status -m controller; juju show-controller | grep ha-status:

Model Controller Cloud/Region Version SLA Timestamp
controller foundations-maas foundations-maas 2.5.1 unsupported 18:39:21Z

Machine State DNS Inst id Series AZ Message
0 started 192.168.151.22 cg6qmw bionic zone1 Deployed
1 started 192.168.153.22 4skcpe bionic zone3 Deployed
2 started 192.168.152.21 wshrgx bionic zone2 Deployed

      ha-status: ha-enabled
      ha-status: ha-enabled
      ha-status: ha-enabled

-> status doesn't report any down / unhealthy controller node even after 5 minutes.

Nobuto Murata (nobuto) wrote :

It took roughly 15 minutes to mark the machine down. But ha-status still looks healthy.

$ juju status -m controller; juju show-controller | grep ha-status:
Model Controller Cloud/Region Version SLA Timestamp
controller foundations-maas foundations-maas 2.5.1 unsupported 18:53:25Z

Machine State DNS Inst id Series AZ Message
0 down 192.168.151.22 cg6qmw bionic zone1 Deployed
1 started 192.168.153.22 4skcpe bionic zone3 Deployed
2 started 192.168.152.21 wshrgx bionic zone2 Deployed

      ha-status: ha-enabled
      ha-status: ha-enabled
      ha-status: ha-enabled

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers