Comment 0 for bug 1882336

Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

[Bug Description]
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.

The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).

[Steps to Reproduce]
  1)
  2)
  3)

[Actual Results]

[Expected Results]

[Reproducibility]

[Additional information]
  (Firmware version, kernel version, affected hardware, etc. if required):

[Resolution]
arm64: Implement optimised checksum routine