Consequent -1 node failures lead to Corosync split-brain
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Triaged
|
Critical
|
Bogdan Dobrelya | ||
6.0.x |
Invalid
|
Undecided
|
Bogdan Dobrelya |
Bug Description
Multinode with HA with 3 controllers (node-10, node-7, node-8) + 2 Node (compute+storage) Juno on Ubuntu 14.01
Test case:
1. kill one of controllers (master controller or slave controller for rabbitmq)
2. wait until RabbitMQ cluster became reassembled by ocf script and became stable
3. turn on controller
4. wait until RabbitMQ cluster became reassembled by ocf script and became stable
5. goto 1
after maybe 5-7 cycles we see two different RabbitMQ clusters within the cloud
http://
This state is _permanent_ and will not be fixed by OCF script without admin actions
node-7 and node-10 both has the master status with the slave node-8
[root@fuel ~]# fuel --f
DEPRECATION WARNING: file /etc/fuel/
api: '1.0'
astute_sha: 4a117a1ca6bdcc3
auth_required: true
build_id: 2015-03-23_15-29-20
build_number: '218'
feature_groups:
- mirantis
fuellib_sha: a0265ae47bb2307
fuelmain_sha: a05ab877af31924
nailgun_sha: 7c100f47450ea1a
ostf_sha: a4cf5f218c6aea9
production: docker
python-
release: '6.1'
release_versions:
2014.2-6.1:
VERSION:
api: '1.0'
astute_sha: 4a117a1ca6bdcc3
build_id: 2015-03-23_15-29-20
build_number: '218'
feature_
- mirantis
fuellib_sha: a0265ae47bb2307
fuelmain_sha: a05ab877af31924
nailgun_sha: 7c100f47450ea1a
ostf_sha: a4cf5f218c6aea9
production: docker
python-
release: '6.1'
description: | updated |
Changed in fuel: | |
importance: | Undecided → High |
description: | updated |
summary: |
- RabbitMQ split brain + RabbitMQ split-brain |
description: | updated |
description: | updated |
summary: |
- RabbitMQ split-brain + Pacemaker cluster failure leads to RabbitMQ split-brain |
Changed in fuel: | |
milestone: | none → 6.1 |
summary: |
- Pacemaker cluster failure leads to RabbitMQ split-brain + Pacemaker cluster failure leads to Corosync split-brain |
Changed in fuel: | |
status: | New → Confirmed |
summary: |
- Pacemaker cluster failure leads to Corosync split-brain + Consequent -1 node failures lead to Corosync split-brain |
tags: | added: corosync |
Changed in fuel: | |
importance: | Critical → High |
description: | updated |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando) |
tags: | added: scale |
The time periods between the actions were not less then 10-20 minutes. Every time rabbit cluster became fully assembled and stable