ha-guide-draft intro-ha-key-concepts.html Recovery Automation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openstack-manuals |
Won't Fix
|
Low
|
Unassigned |
Bug Description
Might want to throw in a section for "automatic recovery of services" as it is only loosely coupled with HA and not always needed. Below are some general notes on it.
Automatic recovery isn't really a requirement for all HA solutions, but can make an admin's job easier. The non-automated way of doing things may be to have a monitoring solution that triggers and alerts ops once things fail. Ops will then be responsible for fixing the issue. For services setting behind a load balancer or queue workers, this may not be an issue as long as the other active nodes can handle the load. To cut down the load on ops, many of these can be set up to for automated recovery.
For those active/passive services managed by things like pacemaker and haproxy, this is built-in. For many of the openstack services a simple upstart or systemd respawn entry can help. For other services like keystone(apache2), memcached, galera or rabbit you may need to employ other solutions as a respawn may not be available or desired. For compute hypervisor hardware failures, this may include a solution to migrate VM's to another compute and disable the nova-compute service for a later hardware fix.
Changed in openstack-manuals: | |
status: | New → Triaged |
importance: | Undecided → Low |
milestone: | none → pike |
tags: | removed: intro-ha-key-concepts.html |
created: https:/ /storyboard. openstack. org/#!/ story/2005744