(In reply to comment #5)
> Now you're talking ...
> 
> 1) No one (certainly not me) said your reported results were invalid. I don't
> think
> you've analyzed the results correctly (see your callgrind measurements,
> dataLength
> is not I/O related per se). And its premature to advertise a performance "fix"
> like running
> --rebuilddb periodically when the performance problem is poorly characterized
> and understood.

I've not done any --rebuilddb analysis - I've just wrote:

rpm -qa  time before rebuild was 47 seconds - after rebuild 12 seconds.
I've not saved older dataset for this analysis as I've not expected any problems. So it's just the fact that speed of my machine with --rebuildb improved ~4 times.


> 
> 2) I did not say your metric was "invalid", read what I wrote. I have tried the
> results myself.
> I run callgrind on rpm at least weekly and already know (and have fixed) many
> performance
> problems in RPM. I did point out that there are other issues than I/O, and
> suggested callgrind,
> where I/O overhead is _NOT_ the issue that you have measured. I did point out
> that you hav
>  another level of caching that needs to be controlled for useful I/O metrics.

Do you flush disk buffers within your tests ?

Time when all data are buffered in memory is 'almost' acceptable (though there are still some reserves - but there might be limits from DB format, which is probably nontrivial to change)

My report is mainly about the moment when there are no data in memory - thus trivial query for installed package takes 12 seconds.

> 3) If "sleeps and waits" is the issue (its not afaik), that was not at all
> clear from your
> wall clock benchmarks. And I most definitely know both rpm and db4 sources, in
> fact I have
> achieved a measured (w callgrind) 14.6x performance increase @rpm5.org  by
> running careful
> (better than wallclock) benchmarks. But that's not relevant here.

Is this rpm 4.7 going to be replaced by rpm 5 - or is it unrelated project to Fedora's rpm package?


> 4) Stare at the numero uno piggy in the callgrind spewage. When you start to
> realize that
> serialization and marshalling is the issue, then you will begin to understand
> the
> performance issue.

As I've said - callgrind will not show I/O stalls.

> 5) I'm not sure how SHA512 is related other than through signatures, where
> --nosignature
> is the disabler. In all cases, verifying digests on header blob's is overhead
> unrelated
> to I/O performance and must be controlled for.

Sure it's not related to slow disk reading - it's just what callgrind shows - and I've been just curious how much memory chunks needs to be checksummed for every simple rpm command - maybe it might be effective to use a short term daemon, to speed up repeated invocation (if daemon keeps lock on database)

> 6) yum performance depends on many factors unrelated to rpm. But run benchmarks
> on yum if you wish to understand yum performance problems. Without
> measurements,
> feel free to claim anything you wish about the cause of yum's pathetic
> performance,
> your opinion is as good or bad as anyone else's.

Yeah - sure python is much bigger CPU eater in this case - but rpm is not negligible either...