The Pi4 is definitively running the Neon code? That's a puzzler. The curve for the Pi4 is sort of the inverse of the Grv2/Grv3 curve. I'll see if I can round up a Pi4. Why would it run slower running Neon code for memcmp? From the chart it is the >64 bytes memcmp calls that take distinctly longer. So it should be running Neon code 90% of the time. I'll take another look at the data.
The Pi4 is definitively running the Neon code? That's a puzzler. The curve for the Pi4 is sort of the inverse of the Grv2/Grv3 curve. I'll see if I can round up a Pi4. Why would it run slower running Neon code for memcmp? From the chart it is the >64 bytes memcmp calls that take distinctly longer. So it should be running Neon code 90% of the time. I'll take another look at the data.