feature: auto eviction of slow/laggy nodes

Bug #1815196 reported by Michał Ajduk
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Triaged
Wishlist
Unassigned

Bug Description

This is a feature request.

Since percona-cluster does not allow setting auto_evict, we have observed, that one node with temporaty slow io/network connection will not be evicted from the cluster with sync-replication, degrading overall cluster performance.

Nice to have feature would be to enable auto_evict on the cluster so that Percona cluster evicts such misbehaving node.

Scope:
- add auto_evict to charm settings
- add monitor if node is exicted, put such node in blocked state in juju
- add action to rejoin node to the cluster after eviction and a fix.

Tags: 4010
James Page (james-page)
Changed in charm-percona-cluster:
status: New → Triaged
importance: Undecided → Wishlist
summary: - Percona-cluster charm missing auto_evict
+ feature: auto eviction of slow/laggy nodes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.openstack.org/643701
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=b96f7120c44b6c8bc7b804b72f9a25a7c55d005e
Submitter: Zuul
Branch: master

commit b96f7120c44b6c8bc7b804b72f9a25a7c55d005e
Author: Dimos Alevizos <email address hidden>
Date: Tue Feb 19 21:17:10 2019 +0200

    Add support for autoeviction of slow nodes.

    Partial-Bug: 1815196

    Change-Id: Ia3084c2a7eb8c4dc9b4eb7c6372369a5996f87b5

Revision history for this message
Ryan Beisner (1chb1n) wrote :
Revision history for this message
David Ames (thedac) wrote :

So my concern is similar to the problem for rabbitmq-server in LP#1818260. If the cluster is acting on its own to remove units independent of the charm. During deploy time when resources are scarce we are likely to hit false positives and have units removed.

Sounds like we need to discuss this change a bit further.

In the meantime, at a minimum for [0], we would need the action to re-add a node and verbiage in the config option that warns the user the cluster acting on its own could cause problems.

The approach being taken for rabbit will be to ignore the config option during deploy time and only after the cluster is complete add the configuration. This still leaves us vulnerable to the cluster acting before we are ready but it helps mitigate the problem.

[0] https://review.opendev.org/#/c/649083/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-percona-cluster (master)

Change abandoned by "Alex Kavanagh <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-percona-cluster/+/649083
Reason: This has been siting open for 4 years with no further comment from the OP; it's a wishlist bug. Please feel free to re-open if this patch should be re-considered for inclusion. Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.