Comment 132 for bug 727064

Revision history for this message
In , siarhei.siamashka (siarhei.siamashka-redhat-bugs) wrote :

(In reply to comment #46)
> (In reply to comment #43)
> > The upside is (ought to be) faster memcpy, which is something that helps a lot
> > of apps.
>
> Hey, I'm a big believer in fast memcpy's, I just don't believe that going
> backwards helps performance.
>
> In the kernel, the optimized x86 memcpy we use is actually a memmove(), because
> while performance is really important, so is repeatability and avoiding
> surprises (strictly speaking, we have two: the "rep movs" version for the case
> where that is supposed to be fast, and the open-coded copy version. The "rep
> movs" version is forwards-only and doesn't handle overlapping areas).
>
> I dunno. I just tested my stupid "mymemcpy.so" against the glibc memcpy() on
> the particular kind of memcpy that valgrid reports (16-byte aligned 1280-byte
> memcpy).
>
> I did both cached (same block over and over) and non-cached (a million blocks
> in sequence).
>
> For the cached case my stupid LD_PRELOAD version was consistently a bit faster.

The same Intel developers submitted a similar optimization to libpixman, and provided the following explanation when asked about about this copying backwards part:
http://lists.freedesktop.org/archives/pixman/2010-August/000423.html

I also was not totally convinced that the backwards copy is really the best solution for the problem:
http://lists.freedesktop.org/archives/pixman/2010-September/000465.html
http://lists.freedesktop.org/archives/pixman/2010-September/000469.html