backport arm64 THP improvements from 6.9
Bug #2059316 reported by
dann frazier
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
In Progress
|
Undecided
|
dann frazier | ||
Noble |
New
|
Undecided
|
Unassigned | ||
linux-nvidia (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Noble |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Initial support for multi-size THP landed upstream in v6.8. In the 6.9 merge window, 2 other series have landed that show significant performance improvements on arm64
mm/memory: optimize fork() with PTE-mapped THP
https:/
Transparent Contiguous PTEs for User Mappings:
https:/
On an Ampere AltraMax system w/ 4K page size, kernel builds in a tmpfs are reduced from 6m30s to 5m17s, a ~19% improvement.
It has been reported that this can have a *10x* improvement for certain GPU workloads on ARM:
CVE References
Changed in linux (Ubuntu): | |
assignee: | nobody → dann frazier (dannf) |
Changed in linux (Ubuntu): | |
status: | New → In Progress |
To post a comment you must log in.
I've build-tested on all architectures in this PPA: /launchpad. net/~dannf/ +archive/ ubuntu/ mthp/+packages
https:/
I manually tested on a few systems, timing a full kernel build w/ the Ubuntu config in a tmpfs (both to stress the system, and look for performance differences).
- ppc64el / Power9 - no performance difference
- x86 - AMD EPYC/Naples - 11% improvement
- Ampere AltraMax (-generic) - 19% improvement