Comment 7 for bug 2030515

Revision history for this message
In , Bruce Merry (bmerry-q) wrote :

When (dst-src)&0xFFF is small (but non-zero), the REP MOVSB path in memcpy performs extremely poorly (as much as 25x slower than the alternative path). I'm observing this on Zen 4 (Epyc 9374F). I'm running Ubuntu 22.04 with a glibc hand-built from glibc-2.38.9000-185-g2aa0974d25.

To reproduce:
1. Download the microbench at https://github.com/ska-sa/katgpucbf/blob/6176ed2e1f5eccf7f2acc97e4779141ac794cc01/scratch/memcpy_loop.cpp
2. Compile it with the adjacent Makefile (tl;dr: g++ -std=c++17 -O3 -pthread -o memcpy_loop memcpy_loop.cpp)
3. Run ./memcpy_loop -t mmap -f memcpy -b 8192 -p 100000 -D 1 -r 5
4. Run GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=10000 ./memcpy_loop -t mmap -f memcpy -b 8192 -p 100000 -D 1 -r 5

Step 3 reports a rate of 4.2 GB/s, while step 4 (which disables the rep_movsb path) reports a rate of 111 GB/s. The test uses 8192-byte memory copies, where the source is page-aligned and the destination starts 1 byte into a page.

I'll also attach the bench-memcpy-large.out, which shows similar results.

I've previously filed this as an Ubuntu bug (https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/2030515) but it doesn't seem to have received much attention.