Comment 0 for bug 2030515

Revision history for this message
Bruce Merry (bmerry) wrote : Terribly memcpy performance on Zen 3 when using rep movsb

On CPUs that advertise FSRM (fast short rep movsb), glibc 2.35 uses REP MOVSB for memcpy for sizes above 2112 (up to some threshold that depends on the cache size). Unfortunately, it seems that Zen 3 (at least in the microcode we're running) is extremely slow at REP MOVSB when the data are not well-aligned.

I've found this using a memcpy benchmark at https://github.com/ska-sa/katgpucbf/blob/69752be58fb8ab0668ada806e0fd809e782cc58b/scratch/memcpy_loop.cpp (compiled with the adjacent Makefile). To demonstrate the issue, run

./memcpy_loop -b 2113 -p 1000000 -t mmap -S 0 -D 1 0

This runs:
- 2113-byte memory copies
- 1,000,000 times per timing measurement
- in memory allocated with mmap
- with the source 0 bytes from the start of the page
- with the destination 1 byte from the start of the page
- on core 0.

It reports about 3.2 GB/s. Change the -b argument to 2111 and it reports over 100 GB/s. So the REP MOVSB case is about 30× slower!

This will most likely need to be reported and fixed upstream, but I'm reporting it to Ubuntu first since I don't know if Ubuntu has modified glibc in any way that would be significant.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: libc6 2.35-0ubuntu3.1
ProcVersionSignature: Ubuntu 5.19.0-46.47~22.04.1-generic 5.19.17
Uname: Linux 5.19.0-46-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
Date: Mon Aug 7 14:02:28 2023
RebootRequiredPkgs: Error: path contained symlinks.
SourcePackage: glibc
UpgradeStatus: No upgrade log present (probably fresh install)