pacemaker compatibility issues when host/container have different versions

Bug #1771612 reported by Michele Baldessari
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
High
Michele Baldessari

Bug Description

When the version of pacemaker inside a container differs from the version running on the host, we can encounter the following error:
"""
Error: unable to get crm_config
cibadmin: Connection to local file '/var/lib/pacemaker/cib/puppet-cib-backup20171212-6-1h3l7tq' failed: Update does not conform to the configured schema
Signon to CIB failed: Update does not conform to the configured schema
"""

This happens in one situation, namely host has a pacemaker version which changed the schema (aka a newer version e.g. 1.1.18) and the containers have an older release in them (e.g. 1.1.16).
Note that the issue is not present when the newer version of pacemaker is in the containers and an older one is on the host.

This is an unfortunate consequence of running stuff partly on the host and partly in containers.
It should also be a "one-time" issue only as no other pcmk rebases are planned for Centos/RHEL 7.

In any case we should explore our options to avoid this issue.

Technically there are two slightly separated problems:
1) The temporary spawn containers that create pcs resources or properties
2) The containers that are managed by pacemaker and do call commandline tools (like cibadmin, crm_attribute, etc) in their normal operation

We did try to disable schema validation and also patched 1.1.16 with a bunch of patches:
0001-Low-implement-common-base-for-lib-xml2-xslt-log-libq.patch 0002-Feature-enable-upgrade-XSLTs-to-use-xsl-message-mean.patch
0001-Refactor-use-common-base-for-lib-xml2-xslt-log-libqb.patch 0003-Low-libcrmcommon-correct-spelling-of-pre-1.0-CIB-tag.patch
0004-Fix-libcrmcommon-handle-schema-versions-properly.patch

but the problem still persists.

It seems the only way out would be to actually bindmount *all* the folders and binaries necessary for pcs and all the pcmk-utils to work inside containers (both for 1) and 2) )

Revision history for this message
Matt Young (halcyondude) wrote :

Note:

---

# This is another instance of

https://bugs.launchpad.net/tripleo/+bug/1770692
potential for pacemaker version mismatch (BM vs.container) is a risk for OC deploy failures in gates

# RFE to identify this class of issue:

https://bugs.launchpad.net/tripleo/+bug/1771602
RFE: detect and warn when package versions in bare metal vs. container don't match

---

Revision history for this message
Matt Young (halcyondude) wrote :

@michele the RFE is tracking identifying and/or flagging package version mismatches (generally, not specific to pacemaker).

Is it the case that if a mismatch occurs in pacemaker *that* is a case that should not (ever) be supported? Or should we allow for some degree of variance between BM / containers?

Changed in tripleo:
milestone: none → rocky-2
Revision history for this message
Michele Baldessari (michele) wrote :

Hard to say. Right now, I'd say that unless we manage to get a working fix that won't impose this version restriction between containers and host, we only support same versions between host and containers (or later version on containers). Not newer on host and older on containers.

If we can come up (and atm I am not all too positive about it) with a solution, we would be able to relax this restriction.

Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Revision history for this message
Michele Baldessari (michele) wrote :

As an update here. Damien and I are exploring a hybrid solution to this long standing limitation.
Namely:
1. At deployment time and at update time we copy all files belonging to the set of pacemaker-related rpms under /var/lib/tripleo/pacemaker-host
2. We bind mount a bunch of folders (amongst which /var/lib/tripleo/pacemaker-host) inside the pcmk containers and we use custom PATH + LD_LIBRARY_PATH (both the temporary ones created by paunch and the ones managed by pcmk)
3. We eventually remove all pacemaker rpms from the HA kolla containers

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Is this still an issue?

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: wallaby-3 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.