FWIW, backwards REP MOVSB (std; rep movsb; cld) is still horribly slow on Zen 4 (4 GB/s even when the data is nicely aligned and cached).
FWIW, backwards REP MOVSB (std; rep movsb; cld) is still horribly slow on Zen 4 (4 GB/s even when the data is nicely aligned and cached).