Reboot during patch installation causes duplicated RPMs

Bug #1904928 reported by Don Penney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Don Penney

Bug Description

Brief Description
-----------------
A reboot during patch installation that occurs during the rpm transaction will leave the system in a bad state and is potentially catastrophic. The incomplete transaction results in multiple versions of the rpms installed in the rpm database, which in turn causes the patch-agent to think it has work to do which cannot be done. In some cases, these can lead to an unrecoverable boot loop, or patch installation failures.

Severity
--------
Critical

Steps to Reproduce
------------------
While monitoring patching.log to watch installation process, trigger a cold reboot of the node while it is in the middle of the rpm transaction.

Expected Behavior
------------------
System should be recoverable, with manual actions.

Actual Behavior
----------------
System enters a boot loop.

Reproducibility
---------------
Reproducible

Branch/Pull Time/Commit
-----------------------
starlingx/master

Don Penney (dpenney)
Changed in starlingx:
assignee: nobody → Don Penney (dpenney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to update (master)

Fix proposed to branch: master
Review: https://review.opendev.org/763435

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / high priority - serious issue, but likelihood of occurrence should be low

tags: added: stx.5.0 stx.update
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Don Penney (dpenney) wrote :

Reviewed: https://review.opendev.org/763435
Committed: https://opendev.org/starlingx/update/commit/62a66370cac70a62f827fedfc8b76284fde006d3
Submitter: Zuul
Branch: master

commit 62a66370cac70a62f827fedfc8b76284fde006d3
Author: Don Penney <email address hidden>
Date: Thu Nov 19 10:41:27 2020 -0500

    Add protection against duplicate RPMs

    If a cold reboot occurs in the middle of patch installation, the
    system can be left in a state where the patch-agent is unable to
    perform its operations properly. The RPM database can be left with
    duplicate RPMs due to the incomplete transaction, which can in turn
    lead to DNF update installation issues.

    This update adds detection of duplicate RPMs to the patch-agent to
    avoid attempting installation until the system is recovered.

    Additionally, protection is added to the sw-patch init to treat
    multiple reboot patch installations as an error, to avoid boot loops.

    Closes-Bug: 1904928
    Change-Id: Ia06a6f669c45398d7956f2ac2caa76c447bc1b16
    Signed-off-by: Don Penney <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to update (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/update/+/792216

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on update (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/update/+/792216

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to update (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/update/+/793629

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to update (f/centos8)
Download full text (10.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/update/+/793629
Committed: https://opendev.org/starlingx/update/commit/e7f342378bcfe05e81c53f6d15e44a61e89f6caf
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 49e39fa949850a7d6fc6083539a4cd3390e6172b
Author: Charles Short <email address hidden>
Date: Fri May 28 08:55:08 2021 -0400

    Specify the nodeset zuul jobs

    The py2.7 jobs need to specify xenial
    The py3.6 jobs need to specify bionic
    The focal zuul nodes only have python 3.8 installed in them

    Zuul targets that invoke a generic python3 interpreter such
    as pep8 is not specified.

    Also ignore H216 since we still use py2.7.

    The copyright date was updated in order to trigger
    the zuul jobs, as a no-delta type of change.

    Partial-Bug: 1928978

    Signed-off-by: Charles Short <email address hidden>
    Change-Id: I81fd790dfc8a665a4e4e0ff59a013af7921b6e06
    Signed-off-by: Charles Short <email address hidden>

commit 6a019e60c3bd0943904589e47e32c732e48a8086
Author: Chris Friesen <email address hidden>
Date: Thu Mar 4 18:59:53 2021 -0500

    add dcmanager-audit-worker to sample restart script

    We've added a new distributed-cloud process, so update the sample script.

    Story: 2007267
    Task: 42000
    Signed-off-by: Chris Friesen <email address hidden>
    Change-Id: I16522744c9526e134577bda80cf91a8308eee889

commit 9a08e51d41a7b1b5d4c25c2a7775c70b57d278d9
Author: Charles Short <email address hidden>
Date: Sun Jan 24 08:36:23 2021 -0500

    Cap bandit to v1.6.2

    Cap bandit to v1.6.2 so we do not pull in a python3 only version.
    Without this fix the unit tests will fail to setup and run properly.

    Closes-Bug: 1916494

    Signed-off-by: Charles Short <email address hidden>
    Change-Id: I7afbc8d224f28146af42f43593586cd680a52aaf

commit 764a0576011c4a8cb43b2bd3ee1fd890bad6058a
Author: Don Penney <email address hidden>
Date: Mon Dec 21 15:22:29 2020 -0500

    Fix exclusion paths in cgcs-patch build_srpm.data

    The EXCLUDE_LIST_FROM_TAR list in cgcs-patch build_srpm.data includes
    entries to exclude .tox and other dirs when building the package, but
    the path included an extra directory level. This update corrects the
    paths.

    Change-Id: I8b3641e4e86f52ef7b9fb56f9eb3df289935e188
    Closes-Bug: 1908940
    Signed-off-by: Don Penney <email address hidden>

commit adaaba0c21e78a2a24c459e34ea7c9963024e3f6
Author: Don Penney <email address hidden>
Date: Thu Dec 17 13:17:58 2020 -0500

    Add auto-version for remaining stx/update packages

    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.

    Change-Id: I877a6c0802a707863a3b6c7d430146431e10a3ad
    Story: 2008455
    Task: 41459
    Signed-off-by: Don Penney <email address hidden>

commit 62a66370cac70a62f827fedfc8b76284fde006d3
Author: Don Penney <email address hidden>
Date: Thu Nov 19 10:41:27 2020 -0500

    Add protection a...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.