The queens scenario 0 upgrade job is broken - the error happens during the controller upgrade with example trace at [0] (and another example at [1])
u'TASK [Check pacemaker cluster running before upgrade] **************************',
u'fatal: [192.168.24.16]: FAILED! => {"ansible_job_id": "89193676466.46747", "changed": false, "cmd": "pcs cluster status", "failed": true, "finished": 1, "msg": "[Errno 2] No such file or directory", "rc": 2}',
u'',
u'PLAY RECAP *********************************************************************',
u'192.168.24.16 : ok=32 changed=9 unreachable=0 failed=1 ',
u'']
The task that fails is a step0 validation that the cluster is running and that is at [2]. AFAICS the problem may in fact be that pacemaker is not deployed here? Evidence is lack of 'cluster' or 'pacemaker' directories in controller /var/log/ at [3], compared to the passing master at [4].
Filing the bug to capture this info for now.
[0] http://logs.openstack.org/24/567224/23/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/6fdcee8/logs/undercloud/home/zuul/overcloud_upgrade_run_Controller.log.txt.gz
[1] http://logs.openstack.org/24/575424/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/6389692/logs/undercloud/home/zuul/overcloud_upgrade_run_Controller.log.txt.gz
[2] https://github.com/openstack/tripleo-heat-templates/blob/b7dcbd8da79b6119b0b9e35f5cd221338f1f6306/puppet/services/pacemaker.yaml#L148
[3] http://logs.openstack.org/24/575424/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/6389692/logs/subnode-2/var/log/
[4] http://logs.openstack.org/86/575186/4/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/bedb420/logs/subnode-2/var/log/
had another pass today updating with info as I know quique (rover) is also checking this
Still no major breakthrough but definitely no pacemaker here so no surprise that the task which checks cluster status is failing. Still don't know why yet though.
I've been comparing a queens run from http:// logs.openstack. org/24/ 567224/ 28/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 76476d9/ with a master run from http:// logs.openstack. org/46/ 575146/ 2/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 59a34e5/
Evidence for 'definitely no pacemaker' on queens:
* queens no "pacemaker" /corrosync only puppet-pacemaker ansible-pacemaker in yum.log @ http:// logs.openstack. org/24/ 567224/ 28/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 76476d9/ logs/subnode- 2/var/log/ yum.log. txt.gz logs.openstack. org/46/ 575146/ 2/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 59a34e5/ logs/subnode- 2/var/log/ yum.log. txt.gz
but on master have pacemaker-libs/cli clusterlibs etc http://
* controller/subnode 2 rpm_qa log no pacemaker in queens logs.openstack. org/24/ 567224/ 28/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 76476d9/ logs/subnode- 2/rpm-qa. txt.gz logs.openstack. org/46/ 575146/ 2/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 59a34e5/ logs/subnode- 2/rpm-qa. txt.gz
http://
vs master at http://
* queens subnode2/controller has this error and no other pcs/cluster related stuff" Jun 18 09:32:44 centos- 7-rax-iad- 0000192611 puppet-user[8812]: Puppet: :Type:: Service: :ProviderPacema ker_xml: file crm_node does not exist logs.openstack. org/24/ 567224/ 28/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 76476d9/ logs/subnode- 2/var/log/ journal. txt.gz# _Jun_18_ 09_32_44 logs.openstack. org/46/ 575146/ 2/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 59a34e5/ logs/subnode- 2/var/log/ journal. txt.gz# _Jun_15_ 23_53_59
http://
but master e.g. see cluster start @ http://
One diff I can see in the templates but not sure if it is relevant yet is that on queens, we include docker.yaml but not on master like queens @ http:// logs.openstack. org/24/ 567224/ 28/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 76476d9/ logs/undercloud /home/zuul/ overcloud_ deploy. sh.txt. gz and master at http:// logs.openstack. org/46/ 575146/ 2/check/ tripleo- ci-centos- 7-scenario000- multinode- oooq-container- upgrades/ 59a34e5/ logs/undercloud /home/zuul/ overcloud_ deploy. sh.txt. gz /github. com/openstack/ tripleo- heat-templates/ blob/master/ ci/environments /scenario000- multinode- containers. yaml vs /github. com/openstack/ tripleo- heat-templates/ blob/stable/ queens/ ci/environments /scenario000- multinode- containers. yaml
BUT
they both then also include the scenario000 multinode containers and in both queens/master pacemaker is enabled and set in the controller services. https:/
https:/