Review hacluster subordinates for cluster quorum

Bug #1918196 reported by Xav Paice
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
In Progress
Undecided
Unassigned
juju-verify
Won't Fix
Low
Xav Paice

Bug Description

When a requested unit is an application with an hacluster subordinate, we need to ensure that there are enough remaining units of the same hacluster application such that removing the requested principal does not cause the pacemaker cluster to lose quorum and therefore stop the managed services and VIPs.

Requirements:

* specify a principal unit, e.g. mysql/4 or nova-cloud-controller/1, and the juju-verify searches subordinate applications related to the principal for hacluster
* if hacluster is found, ensure there are enough online units that removing the principal does not lose quorum
* If the principal is a known application not needing other checks, add to the supported list
* support listing more than one unit from the same application
* support listing more than one application (e.g. if we wish to shut down a machine hosting multiple units)
* Initial support for the following charms:
  * openstack-dashboard
  * Openstack APIs: all that do not store data, and are fully HA with the aid of hacluster

Output:

* Issue a warning if the principal application is not one supported, but don't fail for that reason
* Fail if any hacluster application would lose quorum as a result of the requested reboot/shutdown
* if a requested unit holds the VIP, issue a warning.

Actions required on hacluster charm:

* status (existing, has the information required)

Specific exclusions:

* Postgresql will not be in the list of supported charms since it is only used on the Maas hosts, which have many other reasons to manually check before rebooting
* percona-cluster (because it stores data and we need to additionally check the replication status). TODO write a separate bug for percona-cluster.

Revision history for this message
Xav Paice (xavpaice) wrote :

added charm-hacluster because though the status action provides the cluster status, we need to know the hostname of the unit it just ran on. That information isn't provided by Juju, and for some providers (e.g. the openstack provider) we have no link between the machine hostname and the unit/machine/installation-id. The dns-name shows just an IP address.

Request for the charm therefore: add a 'hostname' field to the action output.

Revision history for this message
Robert Gildein (rgildein) wrote :

I want to let you know, that there is another proposal [1], which change output
of status action to provide more information about cluster health.

I believe that the hostname should be part of the output from `juju status` and
a bug should be reported against juju.

Revision history for this message
Alvaro Uria (aluria) wrote :

I have filed bug 1918286 against Juju re: get a unit/machine hostname from JujuStatus.

Revision history for this message
Robert Gildein (rgildein) wrote :

I forgot to insert the link [1].

---
[1]: https://review.opendev.org/c/openstack/charm-hacluster/+/766222

Revision history for this message
Alvaro Uria (aluria) wrote :

Xav, your description looks good.

Only a comment about:
"""
* support listing more than one application (e.g. if we wish to shut down a machine hosting multiple units)
"""

juju-verify already discovers other principal units running within the same machine (and submachines).

If multiple units/machines are shared in the CLI, I think the approach should be to group by units of the same type, which is not supported yet:

Now, "juju verify shutdown --units unit/0 otherunit/2" fails when unit and otherunit do not use the same charm. That call should be treated as if 2 calls would have been triggered:
* juju verify shutdown --units unit/0
* juju verify shutdown --units otherunit/2
Note: the verification of other principal units within the same machine and submachines should be merged, though.

This comment is worth a different bug, though.

Xav Paice (xavpaice)
Changed in charm-hacluster:
status: New → In Progress
Changed in juju-verify:
assignee: nobody → Xav Paice (xavpaice)
status: New → In Progress
Revision history for this message
Robert Gildein (rgildein) wrote :

The [[https://bugs.launchpad.net/bugs/1918286|bug#1918286]] bug was marked as duplicate of [[https://bugs.launchpad.net/bugs/1918204|bug#1918204]].

Also the [[https://bugs.launchpad.net/bugs/1918204|bug#1918204]] bug is now Fix Released.

Changed in juju-verify:
status: In Progress → Triaged
importance: Undecided → Low
Revision history for this message
Eric Chen (eric-chen) wrote :

This project is no longer being actively maintained

Changed in juju-verify:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.