Newton HA failure: unable to push cib, cib_apply_diff failed

Bug #1690132 reported by Sagi (Sergey) Shnaidman
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

Newton HA deployment fails:

exception: connect failed
    Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.
    Error: unable to push cib
    Call cib_apply_diff failed (-205): Update was older than existing configuration

    Error: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20170510-2379-k5dlh0 create failed: 
    Error: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]/ensure: change from absent to present failed: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20170510-2379-k5dlh0 create failed: 

oooq based job:

http://logs.openstack.org/63/463763/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha-oooq/c0e9c51/logs/undercloud/home/jenkins/failed_deployment_list.log.txt.gz

non-oooq based job:

http://logs.openstack.org/63/463763/1/experimental-tripleo/gate-tripleo-ci-centos-7-ovb-ha/88ab5c3/logs/postci.txt.gz#_2017-05-10_22_34_20_000

Revision history for this message
Marios Andreou (marios-b) wrote :
Download full text (4.6 KiB)

o/ just spent some time poking at this as my review is blocked on the newton ha at https://review.openstack.org/#/c/463529/

I think it *may* be related to these changes https://review.openstack.org/#/c/422489/ https://review.openstack.org/#/c/422484/ - the puppet-tripleo one i checked and it is in stable/ocata but not stable/newton.

It looks backportable but not sure how else that pacemaker.pp has changed in the meantime. I'll ask bandini to have a look here too.

from emiliens test ci review @ https://review.openstack.org/#/c/463763/ - some pointers + copy/paste for convenience (same in mine at /#/c/463529/)

------------
controller 2
http://logs.openstack.org/63/463763/1/experimental-tripleo/gate-tripleo-ci-centos-7-ovb-ha/88ab5c3/logs/controller-2-tripleo-ci-c-baz/var/log/messages

May 10 22:32:47 localhost os-collect-config: [2017-05-10 22:32:47,183] (heat-config) [DEBUG] [2017-05-10 22:31:09,674] (heat-config) [DEBUG] Running FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/b452d1fd-2c09-42f8-8736-8cdba8af5f80" FACTER_fqdn="controller-2-tripleo-ci-c-baz.localdomain" FACTER_deploy_config_name="ControllerDeployment_Step1" puppet apply --detailed-exitcodes --logdest console --modulepath /etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules /var/lib/heat-config/heat-config-puppet/b452d1fd-2c09-42f8-8736-8cdba8af5f80.pp
...
May 10 22:32:47 localhost os-collect-config: Error: unable to push cib
May 10 22:32:47 localhost os-collect-config: Call cib_apply_diff failed (-205): Update was older than existing configuration
May 10 22:32:47 localhost os-collect-config: #033[1;31mError: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20170510-25854-1r1fc8w create failed: #033[0m
May 10 22:32:47 localhost os-collect-config: #033[1;31mError: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]/ensure: change from absent to present failed: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20170510-25854-1r1fc8w create failed: #033[0m
May 10 22:32:47 localhost os-collect-config: #033[1;31mWarning: /Firewall[998 log all]: Skipping because of failed dependencies#033[0m

------------

http://logs.openstack.org/63/463763/1/experimental-tripleo/gate-tripleo-ci-centos-7-ovb-ha/88ab5c3/logs/controller-1-tripleo-ci-b-bar/var/log/messages

May 10 22:32:46 localhost cib[5815]: error: epoch went backwards: 7 -> 6 (Opts: 0x0)
May 10 22:32:46 localhost cib[5815]: warning: Bad Op <cib_command __name__="cib_command" t="cib" cib_async_id="d996a02c-0b8c-4441-95b1-a8baa57caf67" cib_op="cib_apply_diff" cib_callid="2" cib_callopt="0" cib_clientid="d996a02c-0b8c-4441-95b1-a8baa57caf67" cib_clientname="cibadmin" acl_target="root" cib_user="root" src="controller-1-tripleo-ci-b-bar" cib_delegated_from="controller-1-tripleo-ci-b-bar">

May 10 22:32:46 localhost os-collect-config: #033[mNotice: /Firewall[999 drop all]: Dependency Pcmk_property[property--stonith-enabled] has failures: true#033[0m
May 10 22:32:46 localhost os-collect-config: #033[mNotice: Finished catalog run in 110.51 seconds#033[0m
May 10 22:32:46 localhost os-collect-config: [2017-05-10 22:32:46,934] (heat...

Read more...

Revision history for this message
Marios Andreou (marios-b) wrote :

just posted stable/newton cherrypick at https://review.openstack.org/#/c/463985/ - and I updated my review https://review.openstack.org/#/c/463529/ with a depends-on to test it so lets see - bandini indeed thinks it may be related via irc just now.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Michele Baldessari (michele) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

so that review looks like it fixes this bug for the ovb-ha job which is green but the non-ha is failing now :/ might need another bug for that one

Changed in tripleo:
importance: High → Critical
milestone: none → pike-2
Revision history for this message
Michele Baldessari (michele) wrote :

We are unable to land the fix because the nonha-multinode job is failing: https://bugs.launchpad.net/tripleo/+bug/1690373

Revision history for this message
Simon Wright (simon-ocf) wrote :

I received the same error with ooo HA - newton install. Using overcloud images from https://images.rdoproject.org/newton/delorean/consistent/stable/ which were updated 11-May-2017
Following this post https://bugzilla.redhat.com/show_bug.cgi?id=1441977#c2, I checked the image and found that the ssl socket is activated in /etc/httpd/conf.d/ssl.conf. Commenting this line out, and re-importing the image, allowed the overcloud install to complete. HTH

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/newton)

Reviewed: https://review.openstack.org/463985
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=547d96d70db6e88eb2de44ac8212ed5dd5864692
Submitter: Jenkins
Branch: stable/newton

commit 547d96d70db6e88eb2de44ac8212ed5dd5864692
Author: Michele Baldessari <email address hidden>
Date: Thu Jan 19 10:07:52 2017 +0100

    Add retries to the ::pacemaker::stonith property

    Let's set a default number of retries also for the stonith
    property creation. Just like we do for most of the composable
    HA resource creation.

    Closes-Bug: 1690132
    Change-Id: Ie6e19cc838a3f45100f6c98a350bdf6a37d40590
    Depends-On: I20098c5d69cde356fe79f6d8dbdc03ae42ecb3ef
    (cherry picked from commit be7886a30443c82d7743be8cbdf0e0e2d3a1b26a)

tags: added: in-stable-newton
Changed in tripleo:
status: Triaged → Fix Released
tags: removed: alert ci
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 5.6.1

This issue was fixed in the openstack/puppet-tripleo 5.6.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.