RPM

Bug #635834
Comment #31

Comment 31 for bug 635834

Revision history for this message

In Red Hat Bugzilla #536818, Zdenek (zdenek-redhat-bugs) wrote on 2009-11-16:

#31

(In reply to comment #28)
> Sadly, rpm has been deliberately crippled to avoid mmap(2)
> when the mapped region is larger than a limit (that is too small imho).

Well not really sure if there some 'deliberately' crippling - but there is surely a lack of testing and performance checking.

> (aside)
> The issue way back when was to avoid the appearance of
> large numbers in top(1) displays that bothered lusers who
> reported "bugs". And then there's sparse /var/log/lastlog
> which has all sorts of hilarious hysteria with RPM. There's
> hardly any need to package /var/log/lastlog.

How the lastlog gets connected with rpm is probably beyond my imagination. But the top should be actually showing much better numbers as RS memory should be actually smaller with mmap (when written in a right way) and only VIRT size gets bigger, but that should not bother any user ;)

>
> (another aside)
> Prelinking (through a pipe to prelink --undo) has never been implemented
> correctly

I've prelinking disabled, because it certainly must eat more CPU/Watts to compute dependencies, that it could ever safe during the life time of those valid results between updates... ;)

> But the important rpmdb I/O questions for me are:
> 1) Does Berkeley DB performance degrade over time?
> 2) Is there a demonstrable/measurable need for fallocate? Sure
> unfragmented files have less overhead than fragmented files. But
> the performance gain needs to be balanced against the implementation cost.
> I have yet to hear of any credible performance gain measurements for
> "fragmented" rpmdb's, no one has bothered.

Again I assume this is something for internal testing of rpm tool itself to simulate longterm heavy usage and check whether performance goes down.
As mentioned in commnent #2 I was probably not alone with this problem, but I've not made copy of rpm dir before rebuild :( so I could hardly provide anything better than just observation, that even after --rebuilddb file reading has about 50% speed of 'defragmented' file.

Also when you said BDB could know better then Linux how to access data - that could be only true in case you would use separate partition for such DB file. But when such data are stored on a filesystem like ext3 or any other advance fs which has its own fragmentation I doubt BDB could have any decent algorithm to handle this case.

And I think I should also probably mention that few upgrade were not properly finished because of 'various' rawhide faults - but usually it was handled properly via yum-complete-transaction and package-clean --dupes.
But eventually this might have lead over the time to the increased size because of keeping such invalid transactions still stored in DB - again just a wild guess...?

(In reply to comment #28)
> Sadly, rpm has been deliberately crippled to avoid mmap(2)
> when the mapped region is larger than a limit (that is too small imho).

Well not really sure if there some 'deliberately' crippling - but there is surely a lack of testing and performance checking.

> 
> (another aside)
> Prelinking (through a pipe to prelink --undo) has never been implemented
> correctly

I've prelinking disabled, because it certainly must eat more CPU/Watts to compute dependencies, that it could ever safe during the life time of those valid results between updates... ;)

> But the important rpmdb I/O questions for me are:
>     1) Does Berkeley DB performance degrade over time?
>     2) Is there a demonstrable/measurable need for fallocate? Sure
>     unfragmented files have less overhead than fragmented files. But
>     the performance gain needs to be balanced against the implementation cost.
>     I have yet to hear of any credible performance gain measurements for
>     "fragmented" rpmdb's, no one has bothered.