Galera rebuild failed with pacemaker
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Bogdan Dobrelya | ||
8.0.x |
Invalid
|
High
|
Fuel Sustaining | ||
Mitaka |
Invalid
|
High
|
Bogdan Dobrelya |
Bug Description
Detailed bug description:
Miratis 8.0 with Ubuntu
HA Deployment 3 Nodes Galera, simulate Network outage or Powerloss
After NEtworking is back online, pacemaker was not able to recover all resources.
RabbitMQ failed and MYSQL as well.
Steps to reproduce:
deploy 3 controller node cluster, power off Switches. Rsume Switches
Expected results:
pacemaker should recover resources.
Actual result:
RabbitMQ and Galera MYSQL down.
Failed actions:
p_mysql_start_0 on sm4.domain.tld 'unknown error' (1): call=764, status=Timed Out, last-rc-change='Tue May 3 19:04:37 2016', queued=0ms, exec=300003ms
PCSD Status:
192.168.199.3: Offline
192.168.199.5: Offline
192.168.199.6: Offline
What means PCSD Status : Offiline, nothing found on this in google.
Workaround:
Steps we tried to recover:
pcs resource debug-start clone_p_
rabbitmq started on 1. node, on other nodes stopp_app join cluster, start_app
RabbitMQ restored OK.
Galera dosent start up
pcs resource debug-start clone_p_mysql
*** Error in `/usr/sbin/
resource clone_p_mysql is NOT running
resource clone_p_mysql is NOT running
resource clone_p_mysql is NOT running
error: resources_
Looked for the node with the newest WSREP state.
found one, others have -1 reported.
tried to start that one with pcs cleanup and debug-start
Galera still down.
Ist this a bug or do we just missed something?
description: | updated |
no longer affects: | fuel/newton |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Bogdan Dobrelya (bogdando) |
Please provide diagnostic snapshot.