intermittent HA failures in CI gates due to deployment race conditions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Confirmed
|
Critical
|
Fuel Library (Deprecated) |
Bug Description
http://
Here is what could be seen from logs:
1) Failure of deployment due to
2014-11-
And galera cluster reported its ready for connections 3 minutes *later*
2014-11-
Also there are signal=13 errors in xinetd.log for galeracheck
from 2014-11-16T09:32:38 to 2014-11-
START: galeracheck ... from=10.108.2.2, EXIT: galeracheck signal=13
(10.108.2.2 is management VIP)
2) at the moment of logs snapshot had been taken:
rabbitmqctl report (Nov 16, 09:59) shows 'rabbit@node-1': nodedown
pcs status (Nov 16 09:58:04 2014) shows all resources are stopped.
But there are no errors for this in puppet logs, and debug shows resources as started a minute before:
Sun Nov 16 09:57:55 +0000 2014 Puppet (debug):
-> Simple primitive 'vip__public' global status: start
node-1: start
-> Cloned primitive 'clone_
node-1: start
-> Cloned primitive 'clone_
node-1: start
-> Multistate primitive 'master_
node-1: master
-> Simple primitive 'vip__management' global status: start
node-1: start
-> Cloned primitive 'clone_p_haproxy' global status: start
node-1: start
-> Cloned primitive 'clone_p_mysql' global status: start
node-1: start
Changed in fuel: | |
importance: | Undecided → Critical |
milestone: | none → 6.0 |
status: | New → Confirmed |
assignee: | nobody → Fuel Library Team (fuel-library) |
related, but looks like not a dup https:/ /bugs.launchpad .net/fuel/ +bug/1391180