documentation for power recovery

Bug #1507065 reported by Kevin Fox
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla
Expired
Undecided
Unassigned

Bug Description

There is presently no documented procedure for recovering the various clusters in Kolla when a power failure takes out all cluster nodes at once.

This includes a power back on of the following services:
 * Galera/MariaDB
 * RabbitMQ

For Galera, the data should be recovered from the last written server.

For RabbitMQ, if the OpenStack Services just resubmit the messages, they can be freshly initialized, or recovered, whichever is more reasonable.

Tags: doc
Sam Yaple (s8m)
Changed in kolla:
assignee: nobody → Sam Yaple (s8m)
status: New → Triaged
Revision history for this message
Sam Yaple (s8m) wrote :

At the summit RabbitMQ was discussed and to support lights out changes we wil be backporting clusterer

For galera it will need ot be a bit different. The idea is it should be automated, but im not sure the implementation will allow that. I was tasked with figuring that out and ill work on that right now.

Changed in kolla:
importance: Undecided → High
Changed in kolla:
milestone: none → ocata-1
tags: added: doc
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (master)

Reviewed: https://review.openstack.org/374652
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=d64fc35aeb70b8825754941f7cfd252ef564314a
Submitter: Jenkins
Branch: master

commit d64fc35aeb70b8825754941f7cfd252ef564314a
Author: Paul Bourke <email address hidden>
Date: Thu Sep 22 10:37:34 2016 +0100

    Clean up TODOs from live documentation

    TODOs showing up in ours docs look messy and are of no value to the
    reader.

    Remove these and log bugs for them to be written later.

    Change-Id: Ib9244960e3cedce28b198449898e46668435fce9
    Partial-Bug: #1626455
    Partial-Bug: #1507065
    Partial-Bug: #1626456

Changed in kolla:
milestone: ocata-1 → ocata-2
Changed in kolla:
milestone: ocata-2 → ocata-3
Changed in kolla:
milestone: ocata-3 → ocata-rc1
Changed in kolla:
milestone: ocata-rc1 → pike-1
Changed in kolla:
milestone: pike-2 → pike-3
Revision history for this message
MarginHu (margin2017) wrote :

I met rabbitmq and galera issue when whole cluster was powered off.

I don't know how to recover rabbitmq.

Changed in kolla:
milestone: pike-3 → pike-rc1
Changed in kolla:
milestone: pike-rc1 → queens-1
Changed in kolla:
milestone: queens-2 → queens-3
Changed in kolla:
milestone: queens-3 → queens-rc1
Changed in kolla:
milestone: queens-rc1 → queens-rc2
Changed in kolla:
milestone: queens-rc2 → rocky-1
Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (OCATA, PIKE, QUEENS, ROCKY, ROCKY).
  Valid example: CONFIRMED FOR: OCATA

Changed in kolla:
assignee: Sam Yaple (s8m) → nobody
importance: High → Undecided
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.