[AEP-bug] ext4: more rare direct I/O vs unmap failures
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
intel |
Fix Released
|
Medium
|
Unassigned | ||
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Description:
Even with the ext4_break_
—
The root cause of this issue is that while the ei->i_mmap_sem provides
synchronization between ext4_break_
provide synchronize us with the direct I/O path. This exact same issue exists
in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK.
This allows the direct I/O path to do I/O and raise & lower page->_refcount
while we're executing a truncate/hole punch. This leads to us trying to free
a page with an elevated refcount.
Here's one instance of the race:
CPU 0 CPU 1
----- -----
ext4_punch_hole()
ext4_break_
ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page() # elevates refcount
truncate_
... a few layers ...
dax_disassociat
A similar race occurs when the refcount is being dropped while we're running
ext4_break_
CPU 0 CPU 1
----- -----
ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page()
elevates refcount of page X
ext4_punch_hole()
ext4_break_
__wait_var_event() # called for page X
__put_devmap_
drops refcount of X to 1
__wait_var_events() checks X's refcount in "if (condition)", and breaks.
We never actually called ext4_wait_
ext4_break_
ext4_break_layouts, never attempting to wait on page Y which still has an
elevated refcount of 2.
truncate_
... a few layers ...
dax_disassociat
Essentially the solution will most likely involve adding synchronization between the direct I/O path and truncate/hole punch type operations, and it'll need to happen for both ext4 and XFS, so the filesystem folks need to be involved.
CVE References
no longer affects: | linux (Ubuntu) |
tags: | added: kernel-da-key |
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in intel: | |
status: | New → Triaged |
importance: | Undecided → Medium |
information type: | Private → Public |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Committed |
Changed in intel: | |
status: | Triaged → Fix Released |
tags: |
added: kernel-fixup-verification-needed-bionic removed: verification-needed-bionic |
This bug is related with CLX platform which is not published, keep it private until CLX platform is released. Thanks for your understanding.