RPM

rpm API readonly may leave rpmdb in corrupted state if killed

Bug #633668 reported by Jeff Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
Won't Fix
Low
Unassigned
Fedora
Won't Fix
High

Bug Description

Tracker

Tags: fedora rpmdb
Revision history for this message
In , Lev (lev-redhat-bugs) wrote :

Description of problem:
I've run 'yum info' command (see more details below) under high disk load and decided to kill it before waiting for results. It didn't die quickly after simple kill, so I've made kill -9 and yum left rpm database in bad state.

Version-Release number of selected component (if applicable):
yum-3.2.25-1.fc12.noarch, rpm-4.7.2-1.fc12.i686

How reproducible:
Unknown, not willing to try that again on my working system.

Steps to Reproduce:
1. Run a background process with high disk load (in my case it was k3b starting to burn a dvd).
2. Here is my session console log, use it as a guide to reproduce the problem.

[root@abbot ~]# yum info -C zeroinstall-injector zerofree pangzero
Loaded plugins: fastestmirror, presto, refresh-packagekit
^C^C^Z
[1]+ Stopped yum info -C zeroinstall-injector zerofree pangzero
[root@abbot ~]# kill %1

[1]+ Stopped yum info -C zeroinstall-injector zerofree pangzero
[root@abbot ~]#
[root@abbot ~]#
[root@abbot ~]# fg
yum info -C zeroinstall-injector zerofree pangzero

^Z
[1]+ Stopped yum info -C zeroinstall-injector zerofree pangzero
[root@abbot ~]# kill %1

[1]+ Stopped yum info -C zeroinstall-injector zerofree pangzero
[root@abbot ~]#
[root@abbot ~]#
[root@abbot ~]# kill -9 %1
[root@abbot ~]# kill -9 %1
-bash: kill: (16712) - No such process
[1]+ Killed yum info -C zeroinstall-injector zerofree pangzero
[root@abbot ~]# yum info -C zeroinstall-injector zerofree pangzero
rpmdb: Thread/process 16712/3079165632 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 - (-30974)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed
[root@abbot ~]# yum info -C zeroinstall-injector zerofree pangzero
rpmdb: Thread/process 16712/3079165632 failed: Thread died in Berkeley DB library
error: db4 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 - (-30974)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed

Actual results:
Corrupt rpmdb

Expected results:
Working rpmdb

Additional info:
Why on earth does yum go to the rpmdb if I simply run 'yum info' command. Shouldn't it just use it's own caches in this case?

I was able to fix the problem by running these commands:

cd /var/lib/rpm
db_recover

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

This isn't rpmdb corruption, it's just BDB saying the previous access died in an uncontrolled manner while inside BDB code, which is a condition that isn't automatically cleared (whereas dying in application code while holding a read-only lock is automatically handled these days).

Rpm is blocking the signals for a reason here: safe access - even read-only - in a concurrent setup such as the rpmdb requires locking. Bad things happening when you kill -9 a process while its blocking signals to protect critical sections is not a bug.

Revision history for this message
In , Lev (lev-redhat-bugs) wrote :

Well, from the user point of view it is a bug.

You run a program, you kill it, and next time you try to install anything you need to bring out a geeky console and enter some cryptic magic running from root.

If it is that safe to fix, either yum or rpm should autofix this.

Revision history for this message
In , Matt (matt-redhat-bugs) wrote :

(In reply to comment #1)
> Rpm is blocking the signals for a reason here: safe access - even read-only -
> in a concurrent setup such as the rpmdb requires locking. Bad things happening
> when you kill -9 a process while its blocking signals to protect critical
> sections is not a bug.

That's why POSIX provides fcntl locks that are cleaned up automatically when the process is killed. So, use them.

Revision history for this message
In , Jeff (jeff-redhat-bugs) wrote :

Re comment #3:

Lockes are used to serialize changes. The corollary to choosing
locks that magically evaporate on abnormal (like kill -9) termination
is that whatever inconsistencies/serialization MUST be dealt with.
The existence (or "stale lock" cleanup on process termination) is hardly
the issue.

"Use fcntl locks!" is hardly a panacea; in fact its a useless piece of FUD.

Revision history for this message
In , Matt (matt-redhat-bugs) wrote :

(In reply to comment #4)
> Lockes are used to serialize changes. The corollary to choosing
> locks that magically evaporate on abnormal (like kill -9) termination
> is that whatever inconsistencies/serialization MUST be dealt with.

My proposal is that readers should take a fcntl read lock without modifying the rpmdb and writers should take an fcntl write lock in addition to modifying the rpmdb as they currently do. fcntl will enforce serialization, and the modification made by writers will ensure that the rpmdb is flagged as inconsistent after a writer terminates abnormally. The rpmdb is not inconsistent after a reader terminates abnormally.

> The existence (or "stale lock" cleanup on process termination) is hardly
> the issue.

It is in this bug! Of course, maintaining proper serialization and recovery is the most important concern, and my proposal does that.

> "Use fcntl locks!" is hardly a panacea; in fact its a useless piece of FUD.

It's (the new element of) the solution for this bug.

Revision history for this message
In , Jeff (jeff-redhat-bugs) wrote :

The fcntl shared read/exclusive write lock was implemented
in RPM in 2003 by Gusatvo Niemeyer, Do your homework.

While you are correct that an rpmdb is not "inconsistent" while
reading, it's very much not true that there is no state change.
In fact, reading an rpmdb MUST have write access to create a
shared reader lock with concurrent access.

You don't know what state needs to be preserved, and you have not
described (ine your wee widdle proposal) anything but an fcntl lock.

Everything you claim about "my proposal does that" is naive ignorant FUD.

Jeff Johnson (n3npq)
tags: added: fedora rpmdb
Changed in rpm:
status: New → Won't Fix
importance: Undecided → Low
Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 12 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Changed in fedora:
importance: Unknown → High
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.