Comment 18 for bug 2030515

Revision history for this message
In , Bruce Merry (bmerry-q) wrote :

> On Zen3 I am not seeing such slowdown using vectorized instructions.

Agreed, I'm also not seeing this huge-page slowdown on our Zen 3 servers (this is with Ubuntu 22.04's glibc 2.32; I haven't got a hand-built glibc handy on that server):

$ ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
Using function memcpy
90.065 GB/s
89.9096 GB/s
89.9131 GB/s
89.8207 GB/s
89.952 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
Using function memcpy
116.997 GB/s
116.874 GB/s
116.937 GB/s
117.029 GB/s
117.007 GB/s

On the other hand, there seem to be other cases where REP MOVSB is faster on Zen 3:

$ ./memcpy_loop -D 512 -f memcpy_rep_movsb -r 5 -t mmap 0
Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
Using function memcpy_rep_movsb
22.045 GB/s
22.3135 GB/s
22.1144 GB/s
22.8571 GB/s
22.2688 GB/s

$ ./memcpy_loop -D 512 -f memcpy -r 5 -t mmap 0
Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
Using function memcpy
7.66155 GB/s
7.71314 GB/s
7.72952 GB/s
7.72505 GB/s
7.74309 GB/s

But overall it does seem like the vectorised copy performs better than REP MOVSB on Zen 3.