Optimized memcmp for arm64
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
glibc (Ubuntu) |
Fix Released
|
Undecided
|
Adam Conrad |
Bug Description
A patch has recently landed upstream to optimize memcmp for AArch64:
commit 922369032c604b4
Author: Wilco Dijkstra <email address hidden>
Date: Thu Aug 10 17:00:38 2017 +0100
[AArch64] Optimized memcmp.
This is an optimized memcmp for AArch64. This is a complete rewrite
using a different algorithm. The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop. The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.
This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared. After the first 8 bytes,
align the first input. This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.
This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.
* sysdeps/
Rewrite of optimized memcmp.
Changed in glibc (Ubuntu): | |
assignee: | nobody → Adam Conrad (adconrad) |
status: | New → Confirmed |
This bug was fixed in the package glibc - 2.26-0ubuntu2
---------------
glibc (2.26-0ubuntu2) artful; urgency=medium
* Cherry-pick some changes from Debian git for a few pending Ubuntu bugfixes: tst-tlsopt- powerpc. diff. patches/ any/local- cudacc- float128. diff: Local patch to prevent patches/ arm/git- arm64-memcmp. diff: Backport optimized memcmp patches/ amd64/git- x86_64- search. diff: Backport upstream commit rules.d/ debhelper. mk: Filter python hooks in stage1 (LP: #1715366)
- Update to master and drop redundant submitted-
- debian/
defining __HAVE_FLOAT128 on NVIDIA's CUDA compilers (LP: #1717257)
- debian/
for AArch64, improving performance from 25% to 500% (LP: #1720832)
- debian/
to put x86_64 back in the search path, like in 2.25 (LP: #1718928)
- debian/
-- Adam Conrad <email address hidden> Wed, 11 Oct 2017 14:21:40 -0600