5.4 kernel: when iommu is on crashdump fails

Bug #1922738 reported by Ioanna Alifieraki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Ioanna Alifieraki

Bug Description

[IMPACT]

When iommu is enabled crashdump fails to be collected because crash-kernel crashes
with following trace [1].

Commits that address it :

1ddb32da4a62 iommu/vt-d: Simplify check in identity_mapping()
96d170f3b1a6 iommu/vt-d: Remove deferred_attach_domain()
a11bfde9c77d iommu/vt-d: Do deferred attachment in iommu_need_mapping()
034d98cc0cdc iommu/vt-d: Move deferred device attachment into helper function
1d4615978f52 iommu/vt-d: Add attach_deferred() helper
1ee0186b9a12 iommu/vt-d: Refactor find_domain() helper

[TEST CASE]

Install a 5.4 kernel, add intel_iommu=on and iommu=pt to grub cmdline
and trigger a crash.
The crash kernel that boots will crash with trace [1].

[REGRESSION POTENTIAL]

1) 1ee0186b9a12 iommu/vt-d: Refactor find_domain() helper
Refactors find_domain() into two helpers: 1) find_domain()
only returns the domain in use; 2) deferred_attach_domain() does
the deferred domain attachment if required and return the domain
in use.

2) 1d4615978f52 iommu/vt-d: Add attach_deferred() helper
Add helper function to check if a device's attach process is deffered.
Before this commit, this check was done with "dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO".
This commit wraps it into a function.
Fixes (1).

3) 034d98cc0cdc iommu/vt-d: Move deferred device attachment into helper function
Takes the code that does the deffered attachment from deferred_attach_domain() function
and places it in new do_deferred_attach() function.
Fixes (1).

4) a11bfde9c77d iommu/vt-d: Do deferred attachment in iommu_need_mapping()
This one actually fixes the bug.
Attachement of devive needs to happen before checking if device is identity mapped.
Fixes (1).

5) 96d170f3b1a6 iommu/vt-d: Remove deferred_attach_domain()
Code cleanup, removes deferred_attach_domain() which now is just a wrapper around
find_domain and calls directly find_domain from caller sites.
Fixes (1).

6) 1ddb32da4a62 iommu/vt-d: Simplify check in identity_mapping()
Code cleanup.
Fixes (1).

Commits 2,3,5, and 6 are code movements/cleanups so little regression potential.
Commit 1 is the intial code refactroring ( the rest of commits fix it) and commit
3 fixes the bug.

So far testing has not revealed any regression. Any possible regression will regard
device deffered attachment.

[OTHER]

Kernel affected 5.4.

[1] https://pastebin.ubuntu.com/p/FNxTxjg3DV/

CVE References

Changed in linux (Ubuntu Focal):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Ioanna Alifieraki (joalif)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1922738

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
description: updated
Tim Gardner (timg-tpi)
tags: added: bot-stop-nagging
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Focal):
status: Confirmed → In Progress
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Hi Ioanna, may you please verify the focal kernel in -proposed resolves this bug? thank you!

Revision history for this message
Ioanna Alifieraki (joalif) wrote :

VERIFICATION

Installed kernel from -proposed, enabled iommu rebooted and triggered a crash.
Crashdump generated successfully.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (40.7 KiB)

This bug was fixed in the package linux - 5.4.0-73.82

---------------
linux (5.4.0-73.82) focal; urgency=medium

  * focal/linux: 5.4.0-73.82 -proposed tracker (LP: #1923781)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CIFS DFS entries not accessible with 5.4.0-71.74-generic (LP: #1923670)
    - Revert "cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting
      cifs_sb->prepath."

  * CVE-2021-29650
    - Revert "netfilter: x_tables: Update remaining dereference to RCU"
    - Revert "netfilter: x_tables: Switch synchronization to RCU"
    - netfilter: x_tables: Use correct memory barriers.

  * LRMv4: switch to signing nvidia modules via the Ubuntu Modules signing key
    (LP: #1918134)
    - [Packaging] dkms-build{,--nvidia-N} sync back from LRMv4

  * 5.4 kernel: when iommu is on crashdump fails (LP: #1922738)
    - iommu/vt-d: Refactor find_domain() helper
    - iommu/vt-d: Add attach_deferred() helper
    - iommu/vt-d: Move deferred device attachment into helper function
    - iommu/vt-d: Do deferred attachment in iommu_need_mapping()
    - iommu/vt-d: Remove deferred_attach_domain()
    - iommu/vt-d: Simplify check in identity_mapping()

  * Backport mlx5e fix for tunnel offload (LP: #1921769)
    - net/mlx5e: Check tunnel offload is required before setting SWP

  * Bcache bypasse writeback on caching device with fragmentation (LP: #1900438)
    - bcache: consider the fragmentation when update the writeback rate

  * Fix implicit declaration warnings for kselftests/memfd test on newer
    releases (LP: #1910323)
    - selftests/memfd: Fix implicit declaration warnings

  * net/mlx5e: Add missing capability check for uplink follow (LP: #1921104)
    - net/mlx5e: Add missing capability check for uplink follow

  * [UBUNUT 21.04] s390/vtime: fix increased steal time accounting
    (LP: #1921498)
    - s390/vtime: fix increased steal time accounting

  * Mute/Mic-mute LEDs are not work on HP 850/840/440 G8 Laptops (LP: #1920030)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP 840 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP 440 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP 850 G8

  * Focal update: v5.4.106 upstream stable release (LP: #1920246)
    - uapi: nfnetlink_cthelper.h: fix userspace compilation error
    - powerpc/pseries: Don't enforce MSI affinity with kdump
    - ath9k: fix transmitting to stations in dynamic SMPS mode
    - net: Fix gro aggregation for udp encaps with zero csum
    - net: check if protocol extracted by virtio_net_hdr_set_proto is correct
    - net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
    - sh_eth: fix TRSCER mask for SH771x
    - can: skb: can_skb_set_owner(): fix ref counting if socket was closed before
      setting skb ownership
    - can: flexcan: assert FRZ bit in flexcan_chip_freeze()
    - can: flexcan: enable RX FIFO after FRZ/HALT valid
    - can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
    - can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before
      entering Normal Mode
    - tcp: add sanity tests to TCP_QUEUE_SEQ
    - netfilter: nf_nat: undo erroneous tcp edemux lookup
    - ne...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.