OpenStack-Ansible

Asynchronous RPM validation task in ansible-hardening produces RPM database corruption

Bug #1921292 reported by Jeff Albert on 2021-03-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack-Ansible	Fix Released	Medium	Dmitriy Rabotyagov

Bug Description

The ansible-hardening role starts a background process to run and record the output of `rpm -Va` early in the role's execution, and comes back later on to review the results: https://opendev.org/openstack/ansible-hardening/src/branch/master/tasks/rhel7stig/async_tasks.yml#L18-L34

However, doing a full verify including checksums for every package-provided file on a system that's carrying any significant number of packages can take a very long time: on our fairly-generously-spec'd bare metal control nodes, the task can take up to 13 minutes to complete.

Unfortunately, the task that starts the verify command specifies a hard, non-configurable timeout of 300 seconds. When the task runs for longer than this, Ansible terminates it forcefully; this results in RPM database locks left open, and RPM database corruption - and it doesn't even cause the role run to fail, since failed_when is false, so operators won't realize the corruption has happened until several steps later when another RPM task in the hardening role fails. Manual intervention is necessary to repair the database on every affected node. During an upgrade of a large site, this issue has created a major stumbling block for us.

I would propose either extending the timeout significantly and making it configurable, or else changing the command to call rpm with the `--nofiles --nodigest` flags, which will prevent the most IO-intensive parts of the verification and keep the run-time of the task reasonable.

I'll be happy to submit a pull request for either fix strategy, depending on what the maintainers feel is appropriate.

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2021-03-25:

Pushed patch https://review.opendev.org/c/openstack/ansible-hardening/+/782909 that extends limit to 1 hour, which should be totally enough for the deployments I can imagine.

Dmitriy Rabotyagov (noonedeadpunk) on 2021-03-25

Changed in openstack-ansible:
status:	New → Fix Committed
importance:	Undecided → Medium
assignee:	nobody → Dmitriy Rabotyagov (noonedeadpunk)

Dmitriy Rabotyagov (noonedeadpunk) on 2024-02-02

Changed in openstack-ansible:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-08: Fix included in openstack/ansible-hardening yoga-eom

This issue was fixed in the openstack/ansible-hardening yoga-eom release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-03-14: Fix included in openstack/ansible-hardening victoria-eom

This issue was fixed in the openstack/ansible-hardening victoria-eom release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-03-14: Fix included in openstack/ansible-hardening wallaby-eom

This issue was fixed in the openstack/ansible-hardening wallaby-eom release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-03-14: Fix included in openstack/ansible-hardening xena-eom

This issue was fixed in the openstack/ansible-hardening xena-eom release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.