please backport arm64 mem{set,cpy,move} optimizations
Bug #1595739 reported by
dann frazier
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
glibc (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Optimizations to AArch64's memset, memcpy & memmove routines have recently landed upstream. The attached patch applies these clean cherry-picks to the current glibc packaging.
To post a comment you must log in.
Ming Lei did some testing on an X-Gene-based system (see below). TLDR: memcpy isn't significantly different, but memset regresses by about 23%.
A test build w/ these patches is available in lp:dannf/test.
On Mon, Jul 18, 2016 at 10:25 PM, Ming Lei <email address hidden> wrote: /canonical. leankit. com/Boards/ View/108592675/ 122938128 ms10-36- mcdivittB0: ~$ uname -a ms10-36- mcdivittB0: ~$ perf bench mem memcpy -s 32GB -l 8 ms10-36- mcdivittB0: ~$ perf bench mem memset -s 32GB -l 8 ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ dpkg -l | grep libc6 arm64opt. 1 ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ perf bench mem memcpy -s 32GB -l 8 ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ ms10-36- mcdivittB0: ~$ perf bench mem memset -s 32GB -l 8
> On Tue, Jul 19, 2016 at 7:08 AM, Dann Frazier
> <email address hidden> wrote:
>> hey Ming,
>> We did our end of iteration review, and one of the tasks on the
>> backlog was some glibc benchmarking on X-Gene (McDivitt). We went
>> ahead and assigned that to you:
>>
>> https:/
>>
>> Of course, with Andy out, I'm not sure if you already have a full
>> plate. If so, just let me know and I'll move it back to the backlog -
>> it isn't urgent.
>
> Looks no obvious improvement on mcdivitt about memcpy, but bring perf
> regression about memset, please see the following data:
>
> ubuntu@
> Linux ms10-36-mcdivittB0 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13
> 00:06:30 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux
> 1 before optimization
> ubuntu@
> # Running 'mem/memcpy' benchmark:
> # function 'default' (Default memcpy() provided by glibc)
> # Copying 32GB bytes ...
>
> 5.788137 GB/sec
> ubuntu@
> # Running 'mem/memset' benchmark:
> # function 'default' (Default memset() provided by glibc)
> # Copying 32GB bytes ...
>
> 18.434465 GB/sec
> ubuntu@
>
>
> 2, after optimization
> ubuntu@
> ubuntu@
> ii libc6:arm64 2.23-0ubuntu3+
> arm64 GNU C Library: Shared libraries
> ubuntu@
> ubuntu@
> ubuntu@
> ubuntu@
> # Running 'mem/memcpy' benchmark:
> # function 'default' (Default memcpy() provided by glibc)
> # Copying 32GB bytes ...
>
> 5.791313 GB/sec
> ubuntu@
> ubuntu@
> ubuntu@
> # Running 'mem/memset' benchmark:
> # function 'default' (Default memset() provided by glibc)
> # Copying 32GB bytes ...
>
> 15.042083 GB/sec