(In reply to comment #43)
> The upside is (ought to be) faster memcpy, which is something that helps a lot
> of apps.
Hey, I'm a big believer in fast memcpy's, I just don't believe that going
backwards helps performance.
In the kernel, the optimized x86 memcpy we use is actually a memmove(), because while performance is really important, so is repeatability and avoiding surprises (strictly speaking, we have two: the "rep movs" version for the case where that is supposed to be fast, and the open-coded copy version. The "rep movs" version is forwards-only and doesn't handle overlapping areas).
I dunno. I just tested my stupid "mymemcpy.so" against the glibc memcpy() on the particular kind of memcpy that valgrid reports (16-byte aligned 1280-byte memcpy).
I did both cached (same block over and over) and non-cached (a million blocks in sequence).
For the cached case my stupid LD_PRELOAD version was consistently a bit faster.
The noise on the non-cached case was larger, but the glibc memcpy may have been faster. I say "may have been" because it went both ways: I did ten runs, and my LD_PRELOAD one still won 6 out of those 10 runs, but the noise was large enough that I will allow that I'm not going to guarantee anything.
Do I have a point? I bothered to _measure_ the speed, and according to my measurements, glibc wasn't any faster than my trivial version and was likely slower. But I only tested two cases.
Regardless, it boils down to: we know the glibc change resulted in problems for real users. We do _not_ know that it helped anything at all.
And in the end, the big question is simple:
Are you seriously going to do a Fedora-14 release with a known non-working flash player?
(In reply to comment #43)
> The upside is (ought to be) faster memcpy, which is something that helps a lot
> of apps.
Hey, I'm a big believer in fast memcpy's, I just don't believe that going
backwards helps performance.
In the kernel, the optimized x86 memcpy we use is actually a memmove(), because while performance is really important, so is repeatability and avoiding surprises (strictly speaking, we have two: the "rep movs" version for the case where that is supposed to be fast, and the open-coded copy version. The "rep movs" version is forwards-only and doesn't handle overlapping areas).
I dunno. I just tested my stupid "mymemcpy.so" against the glibc memcpy() on the particular kind of memcpy that valgrid reports (16-byte aligned 1280-byte memcpy).
I did both cached (same block over and over) and non-cached (a million blocks in sequence).
For the cached case my stupid LD_PRELOAD version was consistently a bit faster.
The noise on the non-cached case was larger, but the glibc memcpy may have been faster. I say "may have been" because it went both ways: I did ten runs, and my LD_PRELOAD one still won 6 out of those 10 runs, but the noise was large enough that I will allow that I'm not going to guarantee anything.
Do I have a point? I bothered to _measure_ the speed, and according to my measurements, glibc wasn't any faster than my trivial version and was likely slower. But I only tested two cases.
Regardless, it boils down to: we know the glibc change resulted in problems for real users. We do _not_ know that it helped anything at all.
And in the end, the big question is simple:
Are you seriously going to do a Fedora-14 release with a known non-working flash player?