Asynchronous RPM validation task in ansible-hardening produces RPM database corruption
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
Medium
|
Dmitriy Rabotyagov |
Bug Description
The ansible-hardening role starts a background process to run and record the output of `rpm -Va` early in the role's execution, and comes back later on to review the results: https:/
However, doing a full verify including checksums for every package-provided file on a system that's carrying any significant number of packages can take a very long time: on our fairly-
Unfortunately, the task that starts the verify command specifies a hard, non-configurable timeout of 300 seconds. When the task runs for longer than this, Ansible terminates it forcefully; this results in RPM database locks left open, and RPM database corruption - and it doesn't even cause the role run to fail, since failed_when is false, so operators won't realize the corruption has happened until several steps later when another RPM task in the hardening role fails. Manual intervention is necessary to repair the database on every affected node. During an upgrade of a large site, this issue has created a major stumbling block for us.
I would propose either extending the timeout significantly and making it configurable, or else changing the command to call rpm with the `--nofiles --nodigest` flags, which will prevent the most IO-intensive parts of the verification and keep the run-time of the task reasonable.
I'll be happy to submit a pull request for either fix strategy, depending on what the maintainers feel is appropriate.
Changed in openstack-ansible: | |
status: | New → Fix Committed |
importance: | Undecided → Medium |
assignee: | nobody → Dmitriy Rabotyagov (noonedeadpunk) |
Changed in openstack-ansible: | |
status: | Fix Committed → Fix Released |
Pushed patch https:/ /review. opendev. org/c/openstack /ansible- hardening/ +/782909 that extends limit to 1 hour, which should be totally enough for the deployments I can imagine.