RPM

Possible race condition in rpmtsOpenDB [NEEDINFO]

Bug #651431 reported by Jeff Johnson
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
Opinion
Low
Jeff Johnson
Fedora
Won't Fix
High

Bug Description

tracker

Tags: rpmdb
Revision history for this message
In , Bryan (bryan-redhat-bugs) wrote :

Description of problem:

    When running the commands:

        rm -f /var/lib/rpm/__db.*;
        rpm -q kernel-debuginfo > /dev/null &
        rpm -q kernel-debuginfo > /dev/null &

    The following error message will occasionally be displayed:

        rpmdb: Program version 4.3 doesn't match environment version
        error: db4 error(-30974) from dbenv->open: DB_VERSION_MISMATCH:
        Database environment version mismatch
        error: cannot open Packages index using db3 - (-30974)
        error: cannot open Packages database in /var/lib/rpm

    or

 error: db4 error(2) from dbenv->open: No such file or directory
 error: cannot open Packages index using db3 - No such file or
 directory (2)
 error: cannot open Packages database in /var/lib/rpm

    Since the error "cannot open packages database in %s" comes from
    rpmtsOpenDB, maybe the db lock race mentioned in that function is
    being hit?

Version-Release number of selected component (if applicable):

    rpm-4.4.2.3-7.el5

How reproducible:

    2-3 out of 100 iterations of the script above.

Steps to Reproduce:
1. Run

    rm -f /var/lib/rpm/__db.*;
    rpm -q kernel-debuginfo > /dev/null &
    rpm -q kernel-debuginfo > /dev/null &

2. Rinse, repeat.

Actual results:

    Error messages shown above.

Expected results:

    No errors.

Additional info:

    Only reproducible in the ia64 architecture. Not reproducible on
    x86_64.

Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.

Revision history for this message
In , Issue (issue-redhat-bugs) wrote :

Dear Watanabe-san,

----
Though we seem that this problem is not fixed from the latest BZ comment,
what's this ticket going?
Please let us know about it.
----

Engineerings now think of your situation and racy condition on berkley DB
used by rpm,
 and want more information about symptoms to decide if we could provide
fix to initscripts.

-----
   In the system booting, rpmdb file is erased in rc.sysinit.
   Our MW executes rpm -q kernel-debuginfo only once in system booting.
   Also, rpm command executes in another MW.
   However, because the problem only rarely occurs, we could not get the
data.

Can you explain this? What is "MW"? What occurs to make two rpm
commands run simultaneously in the customer's boot scripts?

FJ also said, "it is very difficult for user to fix the their script."
Can you explain why it is very difficult?
-----

So sorry for bothering you but please let me now about...

1. What are MWs and How it worked for customer's system actually?
  Since 'issue' you provided states that "Related Middleware /
Application: None."
  So please let me know what/How MW -- MiddleWare"s" on racy condition
affect this problem?

  To make things more clear, please kindly provide
    e.g. Situation/commands, Name of MiddleWare"s",if possible, and
example for *HOW* affect to the system on the boot?

2. How/Why Difficulty on customer's side?
  Could you explain why it is very difficult to avoid this?
  Maybe, you meant it's not supported by middleware vendors that users
change startup scripts, and so on?
  Please provide more detailed explanation , since we want to know where
your customer stands on.

Thanks in advance.

Regards,
Masaki Furuta

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by <email address hidden>
 issue 249003

Revision history for this message
In , Issue (issue-redhat-bugs) wrote :

Hi Watanabe-san,

We could provide Hotfix package for you, and will request right now!
Please let me know your concern about this!

Here's details from our engineering:
----
SEG and Engineering Management have agreed to provide a one-off Hotfix for
the customer. This means that we will provide them with a supported
package that fixes the problem by adding the 'rpm -q' command to
rc.sysinit, but we will not provide this fix to other customers.

There are still some details that need to be worked out, so I'm not sure
how much we can tell Fujitsu just yet, other than that we're working to
come up with a solution that will be acceptable to everybody.
---

Thanks in advance.

Regards,
Masaki Furuta

This event sent from IssueTracker by <email address hidden>
 issue 249003

Revision history for this message
In , Issue (issue-redhat-bugs) wrote :

Hi Watanabe-san,

Ok, I believed I had understood/shared your thought.
And yes, I agree with you, and we know it's best for all of us that fix
will be released as Eratta on RHEL5.4.

But now, the situation is very hard to us, because this ultimately is an
rpm issue and we also know that we could not change behaviour like
serializing rpmdb open/close on rpmdb easiliy, since it needs some further
work and wider testing somewhere else than RHEL first. So an rpm-level fix
for this is not going to happen for 5.4.

In addition, if we could include this, anything in initscripts is just
working around things.
And I believe that these workarounds also might cause more
troubles/confusion for you and your customers, when rpm(db) will be fixed
and move it out of initscripts pkg into rpm pkg.

So, Let me know that if there's *CHANCE* to release this as HotFix at
this moment?
As you said, when this problem hit multiple customers, please let us how
many there is?
And how could we support those, your customers by HotFix?

Thanks in advance.

Regards,
Masaki Furuta

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by <email address hidden>
 issue 249003

Revision history for this message
In , Issue (issue-redhat-bugs) wrote :

Hi Watanabe-san,

Sorry for delay,
engineering still investigateing best way to solve this, but this really
is considered as must-be-fixed issue.

---- Here's comment from engineering: ----
This is a rather important thing to fix, as the same thing that cures the
races allows curing several other annoyances (see comment #4) too. So the
answer to the "will this be fixed" is certainly "yes, this is a
must-fix issue", just the when part is open: I'm still investigating how
to best fix the thing upstream, there's a whole tangle of locks and
several access modes with funny twists and turns to deal with.
----

I'll keep you posted, and also, could you let me know more detailed
concern about what to add description of this problem to rpm's man-page
and kbase? Please be aware that we would not promise those things but I
believe it's good for us to know your concern:

  * What/How would you like to be explained to your customer?
    e,g. About symptoms/workaronds, and/or as limitation etc..?, if what
you would like to suggest?
  * Are there specific customers having time limits for this or special
concern?
    e,g. should be describe on man page rather than kbase etc..

Please let me know your thoughts, I will disscuss this with engineerings.
And feel free to ask me if anything else.

Thanks in advance.

Regards,
Masaki Furuta

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by <email address hidden>
 issue 249003

Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

We are not going to add 30+ seconds (at a minimum!) to every boot to run --rebuilddb.

Jeff Johnson (n3npq)
tags: added: rpmdb
Revision history for this message
Jeff Johnson (n3npq) wrote :

Doing --rebuilddb during boot doesn't even begin to touch the problem
of a versioned dbenv. What could _EASILY_ be done is add
    rm -f /var/lib/rpm/__db*
(or equivalently, will "work" with "RPM ACID")
    cd /var/lib/rpm
    /usr/lib/rpm/bin/db_recover -ev
to the boot sequence.

The time needed is nothing close to +30 seconds claimed in the bug report:

root@rhel6 rpm]# /usr/bin/time /usr/lib/rpm/bin/db_recover -ev
Finding last valid log LSN: file: 57 offset 5054704
Recovery starting from [57][5052350]
Recovery complete at Thu Oct 7 17:02:43 2010
Maximum transaction ID 800008ef Recovery checkpoint [57][5056958]
0.00user 0.25system 0:00.80elapsed 33%CPU (0avgtext+0avgdata 33296maxresident)k
8472inputs+19576outputs (24major+2135minor)pagefaults 0swaps
[root@rhel6 rpm]# uname -a
Linux rhel6 2.6.32-19.el6.i686 #1 SMP Tue Mar 9 18:10:40 EST 2010 i686 i686 i386 GNU/Linux

And because the command is being run while booting, the behavior is
_NOT_ racy in any fashion whatsoever.

Changed in rpm:
status: New → Opinion
importance: Undecided → Low
assignee: nobody → Jeff Johnson (n3npq)
milestone: none → 5.1.10
milestone: 5.1.10 → 5.3.0
Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release. Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products. This request is not yet committed for inclusion in
a release.

Revision history for this message
In , Florian (florian-redhat-bugs) wrote :

Fixing this requires reimplementing the locking completely. This would be much too invasive for an RHEL5 update and this rather special use case does not justify the risk of other regressions. Closing. Sorry!

Changed in fedora:
importance: Unknown → High
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.