arm64: Implement optimised checksum routine
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kunpeng920 |
Fix Released
|
Undecided
|
Unassigned | ||
Ubuntu-18.04-hwe |
Won't Fix
|
Undecided
|
Unassigned | ||
Ubuntu-20.04 |
Won't Fix
|
Undecided
|
Unassigned | ||
Ubuntu-20.04-hwe |
Fix Released
|
Undecided
|
Unassigned | ||
Upstream-kernel |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Bug Description]
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.
The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).
[Steps to Reproduce]
1)
2)
3)
[Actual Results]
[Expected Results]
[Reproducibility]
[Additional information]
(Firmware version, kernel version, affected hardware, etc. if required):
[Resolution]
v5.6 5777eaed566a arm64: Implement optimised checksum routine
This patch does not meet the criteria for SRU.