ha-guide-draft intro-ha-key-concepts.html Recovery Automation

Bug #1677692 reported by Shannon Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-manuals
Won't Fix
Low
Unassigned

Bug Description

Might want to throw in a section for "automatic recovery of services" as it is only loosely coupled with HA and not always needed. Below are some general notes on it.

Automatic recovery isn't really a requirement for all HA solutions, but can make an admin's job easier. The non-automated way of doing things may be to have a monitoring solution that triggers and alerts ops once things fail. Ops will then be responsible for fixing the issue. For services setting behind a load balancer or queue workers, this may not be an issue as long as the other active nodes can handle the load. To cut down the load on ops, many of these can be set up to for automated recovery.

 For those active/passive services managed by things like pacemaker and haproxy, this is built-in. For many of the openstack services a simple upstart or systemd respawn entry can help. For other services like keystone(apache2), memcached, galera or rabbit you may need to employ other solutions as a respawn may not be available or desired. For compute hypervisor hardware failures, this may include a solution to migrate VM's to another compute and disable the nova-compute service for a later hardware fix.

Lana (loquacity)
Changed in openstack-manuals:
status: New → Triaged
importance: Undecided → Low
milestone: none → pike
tags: removed: intro-ha-key-concepts.html
Revision history for this message
Frank Kloeker (f-kloeker) wrote :
Changed in openstack-manuals:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.