RFE: detect and warn when package versions in bare metal vs. container don't match

Bug #1771602 reported by Matt Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
High
Quique Llorente

Bug Description

This is an RFE identified as part of an escalation related to

https://bugs.launchpad.net/tripleo/+bug/1770692: potential for pacemaker version mismatch (BM vs.container) is a risk for OC deploy failures in gates

In this particular case the version of pacemaker packages in the BM images and in containers did not match, due to the timing / workflow in CI.

This RFE is to add a check to CI to ensure we don't have package version mismatches prior to spending time/resources/cycles both in our clouds running CI jobs, but in the time to triage, diagnose, debug, and resolve issues caused by this class of issue.

Requirements:

- Our CI tooling should fail early
- The identified package mismatch details should be clearly emitted and in the job logs, rendering it discoverable in an obvious way.
- All mismatches should be identified, vs. a "whack a mole" (e.g. iterative, one mismatch per job attempt) to conserve CI resources and human time.
- Both normal deployment / promotion workflows, as well as Gating change workflows should be handled.

Revision history for this message
Matt Young (halcyondude) wrote :

An example of human/debug time spent as a result of this class of issue:

https://bugs.launchpad.net/tripleo/+bug/1771612

Revision history for this message
Tim Rozet (trozet) wrote :

This also becomes a critical issue with libvirtd. Since libvirtd is containerized in the nova_libvirt container, it is using an older version of libvirt which is incompatible with the one in centos 7.5. Due to this when you deploy the deployment hangs at step 4 and is stuck during the deployment at the virsh secret-define command for creating the ceph secret:

bin/bash -c /usr/bin/virsh secret-define --file /etc/nova/secret.xml && /usr/bin/virsh secret-set-value --secret '7f6eaf28-e29b-42f5-8fcd-9109b85e2768'

This ends up hanging for 30 minutes or so until libvirt finally reports back an error.

Matt Young (halcyondude)
Changed in tripleo:
importance: Medium → High
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I think the issue is not CI limited. Those should belong to validations and productized.

tags: added: validations
tags: added: containers
tags: added: queens-backport-potential
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This affects upgrades as well.

tags: added: upgrade
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I'd really wanted to see this marked a critical issue...

Changed in tripleo:
importance: High → Critical
Revision history for this message
Matt Young (halcyondude) wrote :

triage: this is not a *failure* event, but perhaps we should *warn* or *inform* when this is the case...

potential solutions:

in collect-logs...

foreach(c in containers)
    generate rpm list

compare rpm-list from BM <-> c.rpms
warn / alert (or just drop artifact for now)

note: could add this to existing container-check module

Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
assignee: nobody → Quique Llorente (quiquell)
Changed in tripleo:
importance: Critical → Medium
importance: Medium → High
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
status: Triaged → Invalid
status: Invalid → Triaged
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

is this still needed?

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
tags: removed: validations
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: wallaby-3 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.