2020-06-06 08:49:07 |
Fred Kimmy |
bug |
|
|
added bug |
2020-06-06 08:49:18 |
Fred Kimmy |
nominated for series |
|
kunpeng920/ubuntu-18.04-hwe |
|
2020-06-06 08:49:18 |
Fred Kimmy |
bug task added |
|
kunpeng920/ubuntu-18.04-hwe |
|
2020-06-06 08:49:18 |
Fred Kimmy |
nominated for series |
|
kunpeng920/upstream-kernel |
|
2020-06-06 08:49:18 |
Fred Kimmy |
bug task added |
|
kunpeng920/upstream-kernel |
|
2020-06-06 08:49:18 |
Fred Kimmy |
nominated for series |
|
kunpeng920/ubuntu-20.04 |
|
2020-06-06 08:49:18 |
Fred Kimmy |
bug task added |
|
kunpeng920/ubuntu-20.04 |
|
2020-06-08 08:12:00 |
Ike Panhc |
kunpeng920/upstream-kernel: status |
New |
Fix Released |
|
2020-06-08 08:12:05 |
Ike Panhc |
kunpeng920/upstream-kernel: milestone |
|
linux-v5.6 |
|
2020-06-11 08:20:28 |
Andrew Cloke |
kunpeng920/ubuntu-18.04-hwe: status |
New |
Won't Fix |
|
2020-06-11 08:20:31 |
Andrew Cloke |
kunpeng920/ubuntu-20.04: status |
New |
Won't Fix |
|
2020-06-15 06:17:55 |
Ike Panhc |
description |
[Bug Description]
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.
The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).
[Steps to Reproduce]
1)
2)
3)
[Actual Results]
[Expected Results]
[Reproducibility]
[Additional information]
(Firmware version, kernel version, affected hardware, etc. if required):
[Resolution]
arm64: Implement optimised checksum routine |
[Bug Description]
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.
The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).
[Steps to Reproduce]
1)
2)
3)
[Actual Results]
[Expected Results]
[Reproducibility]
[Additional information]
(Firmware version, kernel version, affected hardware, etc. if required):
[Resolution]
v5.6 5777eaed566a arm64: Implement optimised checksum routine |
|
2020-11-02 15:26:11 |
Taihsiang Ho |
nominated for series |
|
kunpeng920/ubuntu-20.04-hwe |
|
2020-11-02 15:26:11 |
Taihsiang Ho |
bug task added |
|
kunpeng920/ubuntu-20.04-hwe |
|
2020-11-02 15:26:54 |
Taihsiang Ho |
kunpeng920/ubuntu-20.04-hwe: milestone |
|
ubuntu-20.04.2 |
|
2020-11-02 15:26:58 |
Taihsiang Ho |
kunpeng920/ubuntu-20.04-hwe: status |
New |
Fix Committed |
|
2020-11-02 15:27:01 |
Taihsiang Ho |
kunpeng920: status |
New |
Fix Committed |
|
2021-02-05 09:05:41 |
Ike Panhc |
kunpeng920/ubuntu-20.04-hwe: status |
Fix Committed |
Fix Released |
|
2021-02-05 09:05:44 |
Ike Panhc |
kunpeng920: status |
Fix Committed |
Fix Released |
|