Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to fix L2 Guest hang during LTP Test

Bug #2076147 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Triaged
High
Ubuntu on IBM Power Systems Bug Triage
linux (Ubuntu)
Triaged
High
Ubuntu on IBM Power Systems Bug Triage

Bug Description

SRU Justification:

[ Impact ]

 * KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10
   PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite.

 * It hangs with:
   "Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab"

 * Diagnosing the issues points this this fix/upstream-commit:
   [commit message, by Barry Song <email address hidden>]
   Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE
   modifications preceded by pte clear. While iterating over PTEs of a large folio,
   it only starts acquiring PTL from the first valid (present) PTE.
   PTE modifications can temporarily set PTEs to pte_none.
   Consequently, the initial PTEs of a large folio might be skipped
   in try_to_unmap_one().
   For example, for an anon folio, if we skip PTE0, we may have PTE0 which is
   still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after
   try_to_unmap_one().
   So folio will be still mapped, the folio fails to be reclaimed and is put
   back to LRU in this round.
   This also breaks up PTEs optimization such as CONT-PTE on this large folio
   and may lead to accident folio_split() afterwards.
   And since a part of PTEs are now swap entries, accessing those parts will
   introduce overhead - do_swap_page.
   Although the kernel can withstand all of the above issues, the situation
   still seems quite awkward and warrants making it more ideal.
   The same race also occurs with small folios, but they have only one PTE,
   thus, it won't be possible for them to be partially unmapped.
   This patch [see below] holds PTL from PTE0, allowing us to avoid reading
   PTE values that are in the process of being transformed. With stable PTE
   values, we can ensure that this large folio is either completely reclaimed
   or that all PTEs remain untouched in this round.
   A corner case is that if we hold PTL from PTE0 and most initial PTEs have
   been really unmapped before that, we may increase the duration of holding
   PTL. Thus we only apply this optimization to folios which are still entirely
   mapped (not in deferred_split list).

[ Fix ]

 * 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803
   "mm: hold PTL from the first PTE while reclaiming a large folio"

[ Test Plan ]

 * An IBM Power 10 system (where PowerVM is mandatory)
   running Ubuntu Server 24.04 (kernel 6.8) or later
   with (nested) KVM setup (so KVM on top of PowerVM).

 * Run LTP test suite
   Tests running: SLS(io,base)

 * Without the patch the above test will hang with
   Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab

[ Where problems could occur ]

 * This is a common code change in the memory management sub-system,
   hence great care needs to be taken, even if it was discussed upfront
   at the https://lore.kernel.org/ mailing list and the upstream commit
   provenance shows that many eyes had a look at this.

 * The modification is relatively small with just one if statement
   (across two lines) in mm/vmscan.c.

 * This change is to assist 'try_to_unmap' to acquire page table locks (PTL)
   from the first page table entry (PTE) and to eliminate the influence of
   temporary and volatile PTE values.

 * If done wrong it can especially have a negative impact in case of large folios.
   and wrong hints might be given to try_to_unmap
   which may lead to bad page swapping.

 * In case of an issue with this patch the result can also be decreased
   performance and efficiency in the page table handling - the opposite
   of what the patch is supposed to address.

 * Fortunately several developers had their eyes on this commit,
   as the provenance of the patch and the discussion at lkml shows.

[ Other Info ]

 * The commit is upstream since v6.10(-rc1), hence it will be included
   in oracular with the planned target kernel.

__________

== Comment: #0 - SEETEENA THOUFEEK <email address hidden> - 2024-08-06 00:20:57 ==
+++ This bug was initially created as a clone of Bug #206372 +++

---Problem Description---
L2 Guest hung during LTP Tests. Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab (edit)

---uname output---
NA

---Additional Hardware Info---
NA

Contact Information = na

---Debugger Data---
NA

---Patches Installed---
NA

---Steps to Reproduce---

Tests running: SLS(io,base)
LPAR Config:
============
PHYP Environment: PowerVM
LPAR Hostname/IP: 10.33.2.107
Rootvg Filesystem: xfs
Network Interface: Shiner-T
vNIC/SR-IOV Config: n/a
IO Type: SAN
IO Disk Type: raw
Multipath Enabled: No
-------------------------------------------------------------------------------------
DUMP Config:
============
KDUMP configured: Yes
XMON enabled no
DUMP Available: no

Machine Type = na

Userspace rpm: NA

The userspace tool has the following bit modes: NA

Userspace tool obtained from project website: na

Userspace tool common name: NA

*Additional Instructions for na:
-Post a private note with access information to the machine that is currently in the debugger.
-Attach ltrace and strace of userspace application.

please include this commit in Ubuntu 24.04

upstream commit which is solving these data store lockups:
73bc32875ee9b1881dd780308c6793fe463fe803 mm: hold PTL from the first PTE while reclaiming a large folio

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-208449 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Frank Heimes (fheimes)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in ubuntu-power-systems:
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote :

Hello and thanks for raising this issue.

Do you have a reproducer on this issue?
Since it's common code (memory mgnt) this needs to be handled with care, since it will affect all installations.
Is it correct that you faced this issue while running the LTP Test suite?
Could you provide more details, like if it was LTP running inside of an KVM guest on P10 (I assume yes), at which test did the issue occurred, how did you called LTP etc.?

Since we are asked to integrate this into 24.04, we have to follow the SRU process,
which requires that we have this template filled out (https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template - we can help with that)
and one section is about a test plan for this.

The level of information here is probably not sufficient to completely fill out the SRU template.

Revision history for this message
Frank Heimes (fheimes) wrote :

I picked the commit and started a test build of the patched kernel that is currently building here:
launchpad.net/~fheimes/+archive/ubuntu/lp2076147

Frank Heimes (fheimes)
description: updated
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Changed in linux (Ubuntu):
status: New → Triaged
Frank Heimes (fheimes)
summary: - L2 Guest hung during LTP Tests. Back trace of paca->saved_r1
- (0xc000000c1bc8bb00) (possibly stale) @ new_slab
+ Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to
+ fix L2 Guest hung during LTP Test
summary: Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to
- fix L2 Guest hung during LTP Test
+ fix L2 Guest hang during LTP Test
Frank Heimes (fheimes)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.