please backport arm64 mem{set,cpy,move} optimizations

Bug #1595739 reported by dann frazier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Won't Fix
Undecided
Unassigned

Bug Description

Optimizations to AArch64's memset, memcpy & memmove routines have recently landed upstream. The attached patch applies these clean cherry-picks to the current glibc packaging.

Tags: arm64
Revision history for this message
dann frazier (dannf) wrote :
Revision history for this message
dann frazier (dannf) wrote :

Ming Lei did some testing on an X-Gene-based system (see below). TLDR: memcpy isn't significantly different, but memset regresses by about 23%.

A test build w/ these patches is available in lp:dannf/test.

On Mon, Jul 18, 2016 at 10:25 PM, Ming Lei <email address hidden> wrote:
> On Tue, Jul 19, 2016 at 7:08 AM, Dann Frazier
> <email address hidden> wrote:
>> hey Ming,
>> We did our end of iteration review, and one of the tasks on the
>> backlog was some glibc benchmarking on X-Gene (McDivitt). We went
>> ahead and assigned that to you:
>>
>> https://canonical.leankit.com/Boards/View/108592675/122938128
>>
>> Of course, with Andy out, I'm not sure if you already have a full
>> plate. If so, just let me know and I'll move it back to the backlog -
>> it isn't urgent.
>
> Looks no obvious improvement on mcdivitt about memcpy, but bring perf
> regression about memset, please see the following data:
>
> ubuntu@ms10-36-mcdivittB0:~$ uname -a
> Linux ms10-36-mcdivittB0 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13
> 00:06:30 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux
> 1 before optimization
> ubuntu@ms10-36-mcdivittB0:~$ perf bench mem memcpy -s 32GB -l 8
> # Running 'mem/memcpy' benchmark:
> # function 'default' (Default memcpy() provided by glibc)
> # Copying 32GB bytes ...
>
> 5.788137 GB/sec
> ubuntu@ms10-36-mcdivittB0:~$ perf bench mem memset -s 32GB -l 8
> # Running 'mem/memset' benchmark:
> # function 'default' (Default memset() provided by glibc)
> # Copying 32GB bytes ...
>
> 18.434465 GB/sec
> ubuntu@ms10-36-mcdivittB0:~$
>
>
> 2, after optimization
> ubuntu@ms10-36-mcdivittB0:~$
> ubuntu@ms10-36-mcdivittB0:~$ dpkg -l | grep libc6
> ii libc6:arm64 2.23-0ubuntu3+arm64opt.1
> arm64 GNU C Library: Shared libraries
> ubuntu@ms10-36-mcdivittB0:~$
> ubuntu@ms10-36-mcdivittB0:~$
> ubuntu@ms10-36-mcdivittB0:~$
> ubuntu@ms10-36-mcdivittB0:~$ perf bench mem memcpy -s 32GB -l 8
> # Running 'mem/memcpy' benchmark:
> # function 'default' (Default memcpy() provided by glibc)
> # Copying 32GB bytes ...
>
> 5.791313 GB/sec
> ubuntu@ms10-36-mcdivittB0:~$
> ubuntu@ms10-36-mcdivittB0:~$
> ubuntu@ms10-36-mcdivittB0:~$ perf bench mem memset -s 32GB -l 8
> # Running 'mem/memset' benchmark:
> # function 'default' (Default memset() provided by glibc)
> # Copying 32GB bytes ...
>
> 15.042083 GB/sec

Revision history for this message
dann frazier (dannf) wrote :

These changes landed upstream in glibc 2.24, so are now in Ubuntu. Marking Fix Released.
Given the test results in Comment #2, marking the xenial task "Won't Fix".

Changed in glibc (Ubuntu):
status: Confirmed → Fix Released
Changed in glibc (Ubuntu Xenial):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.