Fix kernel panic at boot on dual GFX systems

Bug #1926792 reported by Kai-Heng Feng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Critical
Unassigned
Focal
Won't Fix
Undecided
Unassigned
Hirsute
Fix Released
High
Unassigned
linux-oem-5.10 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Critical
Unassigned
Hirsute
Invalid
Undecided
Unassigned

Bug Description

[Impact]
On dual GFX systems, if amdgpu fails to probe a device, vgaarb for
another GFX triggers a NULL pointer dereference and freeze the system.

[Fix]
Defer VGA client registering so resources are handled properly when
amdgpu probe fails.

[Test]
With the patch applied, no more kernel panic when vgaarb is changing the
VGA mode.

[Where problems could occur]
VGA clients won't take effect until a successful probe, so unless there
is very subtle bug in vgaarb or vgaswitcheroo, it's very unlikely to
introduce any regression.

CVE References

Changed in linux-oem-5.10 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-oem-5.10 (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Focal):
status: New → Won't Fix
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu Hirsute):
status: New → Confirmed
Changed in linux-oem-5.10 (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu Hirsute):
importance: Undecided → Critical
Changed in linux-oem-5.10 (Ubuntu Focal):
importance: Undecided → Critical
tags: added: oem-priority originate-from-1914566 stella
Stefan Bader (smb)
Changed in linux (Ubuntu Hirsute):
importance: Critical → High
Stefan Bader (smb)
Changed in linux (Ubuntu Hirsute):
status: Confirmed → Fix Committed
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.10 (Ubuntu Focal):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (29.0 KiB)

This bug was fixed in the package linux - 5.11.0-18.19

---------------
linux (5.11.0-18.19) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-18.19 -proposed tracker (LP: #1927578)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Introduce the 465 driver series, fabric-manager, and libnvidia-nscq
    (LP: #1925522)
    - debian/dkms-versions -- add NVIDIA 465 and migrate 450 to 460

  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files

  * Hirsute update: v5.11.17 upstream stable release (LP: #1927535)
    - vhost-vdpa: protect concurrent access to vhost device iotlb
    - Revert "UBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails"
    - ovl: fix reference counting in ovl_mmap error path
    - coda: fix reference counting in coda_file_mmap error path
    - amd/display: allow non-linear multi-planar formats
    - drm/amdgpu: reserve fence slot to update page table
    - drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish
    - gpio: omap: Save and restore sysconfig
    - KEYS: trusted: Fix TPM reservation for seal/unseal
    - vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
    - pinctrl: lewisburg: Update number of pins in community
    - block: return -EBUSY when there are open partitions in blkdev_reread_part
    - pinctrl: core: Show pin numbers for the controllers with base = 0
    - arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
    - bpf: Allow variable-offset stack access
    - bpf: Refactor and streamline bounds check into helper
    - bpf: Tighten speculative pointer arithmetic mask
    - perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
    - perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
    - perf auxtrace: Fix potential NULL pointer dereference
    - perf map: Fix error return code in maps__clone()
    - HID: google: add don USB id
    - HID: asus: Add support for 2021 ASUS N-Key keyboard
    - HID: alps: fix error return code in alps_input_configured()
    - HID cp2112: fix support for multiple gpiochips
    - HID: wacom: Assign boolean values to a bool variable
    - soc: qcom: geni: shield geni_icc_get() for ACPI boot
    - dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
    - dmaengine: xilinx: dpdma: Fix race condition in done IRQ
    - ARM: dts: Fix swapped mmc order for omap3
    - m68k: fix flatmem memory model setup
    - net: geneve: check skb is large enough for IPv4/IPv6 header
    - dmaengine: tegra20: Fix runtime PM imbalance on error
    - s390/entry: save the caller of psw_idle
    - arm64: kprobes: Restore local irqflag if kprobes is cancelled
    - xen-netback: Check for hotplug-status existence before watching
    - cavium/liquidio: Fix duplicate argument
    - csky: change a Kconfig symbol name to fix e1000 build error
    - ia64: fix discontig.c section mismatches
    - ia64: tools: remove duplicate definition of ia64_mf() on ia64
    - x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
    - net: hso: fix NULL-deref on disconnect regression
    - USB: CDC-ACM...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux-oem-5.10 - 5.10.0-1029.30

---------------
linux-oem-5.10 (5.10.0-1029.30) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1029.30 -proposed tracker (LP: #1930076)

  * CVE-2021-33200
    - bpf: Wrap aux data inside bpf_sanitize_info container
    - bpf: Fix mask direction swap upon off reg sign change
    - bpf: No need to simulate speculative domain for immediates

linux-oem-5.10 (5.10.0-1028.29) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1028.29 -proposed tracker (LP: #1929167)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * TGL-H system NV GPU fallen off the bus after resuming from s2idle with the
    external display connected via docking station (LP: #1929166)
    - SAUCE: ACPI: avoid NVIDIA GPU fallen with an _OSI string

  * AX201 BT will cause system could not enter S0i3 (LP: #1928047)
    - drm/i915: Tweaked Wa_14010685332 for all PCHs

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
    (LP: #1928242)
    - USB: Verify the port status when timeout happens during port suspend
    - Revert "USB: Add reset-resume quirk for WD19's Realtek Hub"

  * Support mic-mute on Dell's platform (LP: #1928750)
    - ASoC: rt715: add main capture switch and main capture volume
    - ASoC: rt715: remove kcontrols which no longer be used
    - ASoC: rt715: modification for code simplicity
    - platform/x86: Move all dell drivers to their own subdirectory
    - SAUCE: platform/x86: dell-privacy: Add support for Dell hardware privacy
    - SAUCE: ASoC: rt715:add micmute led state control supports
    - [Config] Update configs for Dell's E-Privacy

linux-oem-5.10 (5.10.0-1027.28) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1027.28 -proposed tracker (LP: #1927620)

  * Introduce the 465 driver series, fabric-manager, and libnvidia-nscq
    (LP: #1925522)
    - debian/dkms-versions -- add NVIDIA 465 and migrate 450 to 460

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * On TGL platforms screen shows garbage when browsing website by scrolling
    mouse (LP: #1926579)
    - SAUCE: drm/i915/display: Disable PSR2 if TGL Display stepping is B1 from A0

  * Add s2idle support on AMD Renoir and Cezanne (LP: #1927067)
    - drm/amd/display: setup system context in dm_init
    - drm/amd/display: add S/G support for Renoir
    - drm/amdgpu: drop extra drm_kms_helper_poll_enable/disable calls
    - drm/amdgpu: use runpm flag rather than fbcon for kfd runtime suspend (v2)
    - drm/amdgpu: reset runpm flag if device suspend fails
    - drm/amdgpu: add s0i3 capacity check for s0i3 routine (v2)
    - drm/amdgpu: add amdgpu_gfx_state_change_set() set gfx power change entry
      (v2)
    - drm/amdgpu: update amdgpu device suspend/resume sequence for s0i3 support
    - drm/amd/pm: add gfx_state_change_set() for rn gfx power switch (v2)
    - drm/amdgpu: add judgement for suspend/resume sequence
    - drm/amdgpu/pm: no need GPU status set since
      mmnbif_gpu_BIF_DOORBELL_FENCE_CNTL added in FSDL
    - drm/amdgpu: fix shutdown and poweroff process failed with s0ix
    - drm/amdgpu: Only check for S0ix if AMD_PMC ...

Changed in linux-oem-5.10 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (29.2 KiB)

This bug was fixed in the package linux - 5.11.0-18.19+21.10.1

---------------
linux (5.11.0-18.19+21.10.1) impish; urgency=medium

  * impish/linux: 5.11.0-18.19+21.10.1 -proposed tracker (LP: #1927560)

  * Packaging resync (LP: #1786013)
    - [Packaging] update update.conf
    - update dkms package versions

  [ Ubuntu: 5.11.0-18.19 ]

  * hirsute/linux: 5.11.0-18.19 -proposed tracker (LP: #1927578)
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * Introduce the 465 driver series, fabric-manager, and libnvidia-nscq
    (LP: #1925522)
    - debian/dkms-versions -- add NVIDIA 465 and migrate 450 to 460
  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files
  * Hirsute update: v5.11.17 upstream stable release (LP: #1927535)
    - vhost-vdpa: protect concurrent access to vhost device iotlb
    - Revert "UBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails"
    - ovl: fix reference counting in ovl_mmap error path
    - coda: fix reference counting in coda_file_mmap error path
    - amd/display: allow non-linear multi-planar formats
    - drm/amdgpu: reserve fence slot to update page table
    - drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish
    - gpio: omap: Save and restore sysconfig
    - KEYS: trusted: Fix TPM reservation for seal/unseal
    - vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
    - pinctrl: lewisburg: Update number of pins in community
    - block: return -EBUSY when there are open partitions in blkdev_reread_part
    - pinctrl: core: Show pin numbers for the controllers with base = 0
    - arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
    - bpf: Allow variable-offset stack access
    - bpf: Refactor and streamline bounds check into helper
    - bpf: Tighten speculative pointer arithmetic mask
    - perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
    - perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
    - perf auxtrace: Fix potential NULL pointer dereference
    - perf map: Fix error return code in maps__clone()
    - HID: google: add don USB id
    - HID: asus: Add support for 2021 ASUS N-Key keyboard
    - HID: alps: fix error return code in alps_input_configured()
    - HID cp2112: fix support for multiple gpiochips
    - HID: wacom: Assign boolean values to a bool variable
    - soc: qcom: geni: shield geni_icc_get() for ACPI boot
    - dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
    - dmaengine: xilinx: dpdma: Fix race condition in done IRQ
    - ARM: dts: Fix swapped mmc order for omap3
    - m68k: fix flatmem memory model setup
    - net: geneve: check skb is large enough for IPv4/IPv6 header
    - dmaengine: tegra20: Fix runtime PM imbalance on error
    - s390/entry: save the caller of psw_idle
    - arm64: kprobes: Restore local irqflag if kprobes is cancelled
    - xen-netback: Check for hotplug-status existence before watching
    - cavium/liquidio: Fix duplicate argument
    - csky: change a Kconfig symbol name to fix e1000 build error
    - ia64: fix discontig.c section mis...

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.