RPM

Comment 8 for bug 638584

Revision history for this message
In , Jeff (jeff-redhat-bugs) wrote :

Re comment #6:

Actually, the reason for DB_HASH is hysterical, not O(1) performance.
db-1.85 did not have a btree implementation a decade ago.

Citing O(1) or O(log(n)), while true, misses real world issues. E.g.
RPM must lookup file paths in two indices because the data is
not rationally indexed, and a string beginning with '/' might be in
either the Providename or the Basenames table. RPM is often forced to do sequential
access (true for rpm -qa e.g.), and does too many redundant
accesses.

The above issues largely obliterate any performance benefit from using
DB_HASH or DB_BTREE.

Its rather easy to do the benchmarks, just add --stats to any RPM command
and compare using DB_BTREE and DB_HASH. I did due diligence when
I switched from DB_HASH to DB_BTREE @rpm5.org and there was no
measurable performance gain from using either DB_BTREE or DB_HASH.

There are other performance gains from improved access on certain
paths. E.g. rpm-5.2.0 @rpm5.org has a measured (with callgrind and --stats)
14.6x speed-up by changing perhaps 50 lines of code on path lookups.

But clearly I got "lucky" and just guessed which lines of code to change.