(In reply to comment #46)
> (In reply to comment #43)
> > The upside is (ought to be) faster memcpy, which is something that helps a lot
> > of apps.
>
> Hey, I'm a big believer in fast memcpy's, I just don't believe that going
> backwards helps performance.
>
> In the kernel, the optimized x86 memcpy we use is actually a memmove(), because
> while performance is really important, so is repeatability and avoiding
> surprises (strictly speaking, we have two: the "rep movs" version for the case
> where that is supposed to be fast, and the open-coded copy version. The "rep
> movs" version is forwards-only and doesn't handle overlapping areas).
>
> I dunno. I just tested my stupid "mymemcpy.so" against the glibc memcpy() on
> the particular kind of memcpy that valgrid reports (16-byte aligned 1280-byte
> memcpy).
>
> I did both cached (same block over and over) and non-cached (a million blocks
> in sequence).
>
> For the cached case my stupid LD_PRELOAD version was consistently a bit faster.
(In reply to comment #46)
> (In reply to comment #43)
> > The upside is (ought to be) faster memcpy, which is something that helps a lot
> > of apps.
>
> Hey, I'm a big believer in fast memcpy's, I just don't believe that going
> backwards helps performance.
>
> In the kernel, the optimized x86 memcpy we use is actually a memmove(), because
> while performance is really important, so is repeatability and avoiding
> surprises (strictly speaking, we have two: the "rep movs" version for the case
> where that is supposed to be fast, and the open-coded copy version. The "rep
> movs" version is forwards-only and doesn't handle overlapping areas).
>
> I dunno. I just tested my stupid "mymemcpy.so" against the glibc memcpy() on
> the particular kind of memcpy that valgrid reports (16-byte aligned 1280-byte
> memcpy).
>
> I did both cached (same block over and over) and non-cached (a million blocks
> in sequence).
>
> For the cached case my stupid LD_PRELOAD version was consistently a bit faster.
The same Intel developers submitted a similar optimization to libpixman, and provided the following explanation when asked about about this copying backwards part: lists.freedeskt op.org/ archives/ pixman/ 2010-August/ 000423. html
http://
I also was not totally convinced that the backwards copy is really the best solution for the problem: lists.freedeskt op.org/ archives/ pixman/ 2010-September/ 000465. html lists.freedeskt op.org/ archives/ pixman/ 2010-September/ 000469. html
http://
http://