RFE: detect and warn when package versions in bare metal vs. container don't match

Bug #1771602 reported by Matt Young on 2018-05-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Quique Llorente

Bug Description

This is an RFE identified as part of an escalation related to

https://bugs.launchpad.net/tripleo/+bug/1770692: potential for pacemaker version mismatch (BM vs.container) is a risk for OC deploy failures in gates

In this particular case the version of pacemaker packages in the BM images and in containers did not match, due to the timing / workflow in CI.

This RFE is to add a check to CI to ensure we don't have package version mismatches prior to spending time/resources/cycles both in our clouds running CI jobs, but in the time to triage, diagnose, debug, and resolve issues caused by this class of issue.

Requirements:

- Our CI tooling should fail early
- The identified package mismatch details should be clearly emitted and in the job logs, rendering it discoverable in an obvious way.
- All mismatches should be identified, vs. a "whack a mole" (e.g. iterative, one mismatch per job attempt) to conserve CI resources and human time.
- Both normal deployment / promotion workflows, as well as Gating change workflows should be handled.

Matt Young (halcyondude) wrote :

An example of human/debug time spent as a result of this class of issue:

https://bugs.launchpad.net/tripleo/+bug/1771612

Tim Rozet (trozet) wrote :

This also becomes a critical issue with libvirtd. Since libvirtd is containerized in the nova_libvirt container, it is using an older version of libvirt which is incompatible with the one in centos 7.5. Due to this when you deploy the deployment hangs at step 4 and is stuck during the deployment at the virsh secret-define command for creating the ceph secret:

bin/bash -c /usr/bin/virsh secret-define --file /etc/nova/secret.xml && /usr/bin/virsh secret-set-value --secret '7f6eaf28-e29b-42f5-8fcd-9109b85e2768'

This ends up hanging for 30 minutes or so until libvirt finally reports back an error.

Matt Young (halcyondude) on 2018-05-16
Changed in tripleo:
importance: Medium → High
Bogdan Dobrelya (bogdando) wrote :

I think the issue is not CI limited. Those should belong to validations and productized.

tags: added: validations
tags: added: containers
tags: added: queens-backport-potential
Bogdan Dobrelya (bogdando) wrote :

This affects upgrades as well.

tags: added: upgrade
Bogdan Dobrelya (bogdando) wrote :

I'd really wanted to see this marked a critical issue...

Changed in tripleo:
importance: High → Critical
Matt Young (halcyondude) wrote :

triage: this is not a *failure* event, but perhaps we should *warn* or *inform* when this is the case...

potential solutions:

in collect-logs...

foreach(c in containers)
    generate rpm list

compare rpm-list from BM <-> c.rpms
warn / alert (or just drop artifact for now)

note: could add this to existing container-check module

Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
assignee: nobody → Quique Llorente (quiquell)
Changed in tripleo:
importance: Critical → Medium
importance: Medium → High
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
status: Triaged → Invalid
status: Invalid → Triaged
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3

is this still needed?

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin) on 2020-02-10
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
tags: removed: validations
wes hayutin (weshayutin) on 2020-04-07
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin) on 2020-04-13
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin) on 2020-05-26
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers