various jobs failing for overcloud deploy with missing cluster error: Could not connect to cluster (is it running?)", "
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Sagi (Sergey) Shnaidman |
Bug Description
various jobs like [1][2][3][4] tripleo-
are failing during overcloud deploy with missing pacemaker cluster - first error is like
2019-03-25 21:06:50 | "error: Could not connect to cluster (is it running?)",
then the deployment fails (e.g. no rabbit etc )
[1] https:/
[2] http://
[3] https:/
[4] http://
Changed in tripleo: | |
assignee: | nobody → Marios Andreou (marios-b) |
tags: | added: promotion-blocker |
Changed in tripleo: | |
assignee: | Marios Andreou (marios-b) → nobody |
importance: | Undecided → Critical |
Changed in tripleo: | |
assignee: | nobody → Sagi (Sergey) Shnaidman (sshnaidm) |
status: | Triaged → In Progress |
From https:/ /logs.rdoprojec t.org/26/ 645626/ 13/openstack- check/tripleo- ci-centos- 7-ovb-3ctlr_ 1comp-featurese t035/1fef793/ logs/overcloud- controller- 0/var/log/ cluster/ corosync. log.txt. gz we see the following: controller- 0 cib: info: cib_perform_op: + /cib: @num_updates=12 controller- 0 cib: info: cib_perform_op: ++ /cib/status/ node_state[ @id='2' ]/transient_ attributes[ @id='2' ]/instance_ attributes[ @id='status- 2']: <nvpair id="status- 2-last- failure- rabbitmq- bundle- docker- 1.start_ 0" name=" rabbitmq- bundle- docker- 1#start_ 0" value=" 1553686901" /> controller- 0 cib: info: cib_process_ request: Completed cib_modify operation for section status: OK (rc=0, origin= overcloud- controller- 1/attrd/ 5, version=0.14.12) rabbitmq- bundle- docker- 0)[52773] : ERROR: Newly created docker container exited after start controller- 0 lrmd: notice: operation_finished: rabbitmq- bundle- docker- 0_start_ 0:52773: stderr [ Error: No such object: rabbitmq- bundle- docker- 0 ] controller- 0 lrmd: notice: operation_finished: rabbitmq- bundle- docker- 0_start_ 0:52773: stderr [ Error: No such object: rabbitmq- bundle- docker- 0 ] controller- 0 lrmd: notice: operation_finished: rabbitmq- bundle- docker- 0_start_ 0:52773: stderr [ ocf-exit- reason: monitor cmd failed (rc=126), output: rpc error: code = 2 desc = oci runtime error: exec failed: cannot ex controller- 0 lrmd: notice: operation_finished: rabbitmq- bundle- docker- 0_start_ 0:52773: stderr [ controller- 0 lrmd: notice: operation_finished: rabbitmq- bundle- docker- 0_start_ 0:52773: stderr [ ocf-exit- reason: waiting on monitor_cmd to pass after start ] controller- 0 lrmd: notice: operation_finished: rabbitmq- bundle- docker- 0_start_ 0:52773: stderr [ ocf-exit- reason: Newly created docker container exited after start ] controller- 0 lrmd: info: log_finished: finished - rsc:rabbitmq- bundle- docker- 0 action:start call_id:18 pid:52773 exit-code:1 exec-time:1705ms queue-time:0ms controller- 0 crmd: notice: process_lrm_event: Result of start operation for rabbitmq- bundle- docker- 0 on overcloud- controller- 0: 1 (unknown error) | call=18 key=rabbitmq- bundle- docker- 0_start_ 0 confirmed=true cib-update=29 controller- 0 crmd: notice: process_lrm_event: overcloud- controller- 0-rabbitmq- bundle- docker- 0_start_ 0:18 [ Error: No such object: rabbitmq- bundle- docker- 0\nError: No such object: rabbitmq- bundle- docker- 0\nocf- exit-reason: mo n\r\nocf- exit-reason: waiting on monitor_cmd to pass after start\nocf- exit-reason: Newly created docker c
Mar 27 11:41:41 [29680] overcloud-
Mar 27 11:41:41 [29680] overcloud-
last-failure-
Mar 27 11:41:41 [29680] overcloud-
Mar 27 11:41:41 docker(
Mar 27 11:41:41 [29682] overcloud-
Mar 27 11:41:41 [29682] overcloud-
Mar 27 11:41:41 [29682] overcloud-
ec a container that has run and stopped ]
Mar 27 11:41:41 [29682] overcloud-
]
Mar 27 11:41:41 [29682] overcloud-
Mar 27 11:41:41 [29682] overcloud-
Mar 27 11:41:41 [29682] overcloud-
Mar 27 11:41:41 [29685] overcloud-
Mar 27 11:41:41 [29685] overcloud-
nitor cmd failed (rc=126), output: rpc error: code = 2 desc = oci runtime error: exec failed: cannot exec a container that has run and stopped\
Mar 27 11:41:41 [29680] overcl...