Further testing points towards thread switching behavior being related:
Nothing running in parallel: <10MB/s at big block sizes perl -e "while (1) { }": <5MB/s perl -e "use Time::HiRes qw( usleep ); while (1) { usleep(1) }": 20-25MB/s perl -e "use Time::HiRes qw( nanosleep ); while (1) { nanosleep(1) }": 5-25MB/s (varying widely over time) perl -e "use Thread qw( yield ); while (1) { yield }": 10-20MB/s sudo ping -q -i.001 localhost: 25MB/s (consistently)
Note that the sleep variants only reach ~25% CPU load (as seen in top), while the yield and ping variants use a full core.
Further testing points towards thread switching behavior being related:
Nothing running in parallel: <10MB/s at big block sizes
perl -e "while (1) { }": <5MB/s
perl -e "use Time::HiRes qw( usleep ); while (1) { usleep(1) }": 20-25MB/s
perl -e "use Time::HiRes qw( nanosleep ); while (1) { nanosleep(1) }": 5-25MB/s (varying widely over time)
perl -e "use Thread qw( yield ); while (1) { yield }": 10-20MB/s
sudo ping -q -i.001 localhost: 25MB/s (consistently)
Note that the sleep variants only reach ~25% CPU load (as seen in top), while the yield and ping variants use a full core.