It's been some time since the original benchmarks, so I'm repeating the test from the description. I haven't used hyperfine for the comparisons below, so they won't have the same statistical reliability but should nevertheless be sufficient for validation.
Binaries have been compiled as below:
$ gcc -mtune=generic -march=x86-64 -g -O3 test_memcpy.c -o test_memcpy64
---- AMD ----
$ grep -m1 "model name" /proc/cpuinfo
model name : AMD Ryzen 7 3700X 8-Core Processor
$ dpkg -l | grep -m1 libc6
ii libc6:amd64 2.31-0ubuntu9.2 amd64 GNU C Library: Shared libraries
$ ./test_memcpy64 32
32 MB = 2.506206 ms
-Compare match (should be zero): 0
$ dpkg -l | grep -m1 libc6
ii libc6:amd64 2.31-0ubuntu9.4~20210524ppa1 amd64 GNU C Library: Shared libraries
$ ./test_memcpy64 32
32 MB = 1.384115 ms
-Compare match (should be zero): 0
So, for AMD it's a very noticeable improvement (1.38ms vs 2.51ms).
---- Intel ----
$ grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
$ dpkg -l | grep -m1 libc6
ii libc6:amd64 2.31-0ubuntu9.2 amd64 GNU C Library: Shared libraries
$ ./test_memcpy64 32
32 MB = 2.304554 ms
-Compare match (should be zero): 0
$ dpkg -l | grep -m1 libc6
ii libc6:amd64 2.31-0ubuntu9.4~20210524ppa1 amd64 GNU C Library: Shared libraries
$ ./test_memcpy64 32
32 MB = 2.209747 ms
-Compare match (should be zero): 0
For Intel the difference isn't very significant, but there are also no performance regressions (2.30ms vs 2.21ms).
It's been some time since the original benchmarks, so I'm repeating the test from the description. I haven't used hyperfine for the comparisons below, so they won't have the same statistical reliability but should nevertheless be sufficient for validation.
Binaries have been compiled as below:
$ gcc -mtune=generic -march=x86-64 -g -O3 test_memcpy.c -o test_memcpy64
---- AMD ----
$ grep -m1 "model name" /proc/cpuinfo
model name : AMD Ryzen 7 3700X 8-Core Processor
$ dpkg -l | grep -m1 libc6
ii libc6:amd64 2.31-0ubuntu9.2 amd64 GNU C Library: Shared libraries
$ ./test_memcpy64 32
32 MB = 2.506206 ms
-Compare match (should be zero): 0
$ dpkg -l | grep -m1 libc6 4~20210524ppa1 amd64 GNU C Library: Shared libraries
ii libc6:amd64 2.31-0ubuntu9.
$ ./test_memcpy64 32
32 MB = 1.384115 ms
-Compare match (should be zero): 0
So, for AMD it's a very noticeable improvement (1.38ms vs 2.51ms).
---- Intel ----
$ grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
$ dpkg -l | grep -m1 libc6
ii libc6:amd64 2.31-0ubuntu9.2 amd64 GNU C Library: Shared libraries
$ ./test_memcpy64 32
32 MB = 2.304554 ms
-Compare match (should be zero): 0
$ dpkg -l | grep -m1 libc6 4~20210524ppa1 amd64 GNU C Library: Shared libraries
ii libc6:amd64 2.31-0ubuntu9.
$ ./test_memcpy64 32
32 MB = 2.209747 ms
-Compare match (should be zero): 0
For Intel the difference isn't very significant, but there are also no performance regressions (2.30ms vs 2.21ms).