Lower the volume of expected s0i3 WARN_ON

Bug #1961119 reported by Alex Hung
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Medium
Alex Hung
linux-oem-5.14 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

AMD has a known BIOS bug right now that leads to the following WARN traceback during s0i3 resume (see below).

This doesn't have (measurable) functional impact to the system or GPU, but it is an ugly message to have in place and makes it look like a potentially major problem.

AMD is working on fixing this in BIOS, but it may not be ready by the time the first platforms launch.

[Fix]

Print a shorter message warning instead: "Watermarks table not configured properly by SMU".

The fixed was cherry-picked from mainline kernel v5.17-rc4.

[Test]

This is requested by AMD, and tested and verified on the AMD CRB.

[Where problems could occur]

Low. This only changes the behavors of message printing There are no functional changes.

------- WARN_ON messages -------

------------[ cut here ]------------
WARNING: CPU: 8 PID: 60399 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn31/dcn31_smu.c:123 dcn31_smu_send_msg_with_param+0xec/0x130 [amdgpu]
Modules linked in: btrfs blake2b_generic xor zstd_compress raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c cpuid michael_mic hid_logitech_hidpp snd_usb_audio snd_usbmidi_lib hid_logitech_dj usbhid cdc_ether usbnet r8152 mii qrtr_mhi joydev intel_rapl_msr intel_rapl_common bnep snd_soc_dmic snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine edac_mce_amd amdgpu snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi nls_iso8859_1 kvm_amd kvm snd_hda_intel qrtr snd_intel_dspcfg ns snd_intel_sdw_acpi snd_hda_codec ath11k_pci crct10dif_pclmul input_leds snd_pci_acp6x ghash_clmulni_intel ath11k snd_hda_core snd_hwdep qmi_helpers aesni_intel snd_pcm uvcvideo crypto_simd videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 iommu_v2 cryptd platform_profile btusb snd_seq_midi gpu_sched rapl mac80211 serio_raw snd_seq_midi_event videobuf2_common sparse_keymap drm_ttm_helper btrtl snd_rawmidi btbcm videodev
 btintel ttm efi_pstore wmi_bmof hid_multitouch mc bluetooth snd_seq drm_kms_helper snd_seq_device ecdh_generic cfg80211 ecc snd_timer cec rc_core i2c_algo_bit snd fb_sys_fops syscopyarea sysfillrect mhi sysimgblt libarc4 soundcore snd_rn_pci_acp3x snd_pci_acp3x ccp ucsi_acpi typec_ucsi amd_pmc wireless_hotkey mac_hid typec acpi_tad sch_fq_codel msr parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic crc32_pclmul thunderbolt nvme i2c_piix4 xhci_pci xhci_pci_renesas nvme_core iosm wmi i2c_hid_acpi video i2c_hid hid
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:dcn31_smu_send_msg_with_param+0xec/0x130 [amdgpu]
Code: 00 e8 38 60 fc ff 48 8b 3b 4c 89 ea be 9b 62 01 00 e8 c8 60 fc ff 85 c0 75 40 bf c6 a7 00 00 e8 aa da 0c c6 41 83 ec 01 75 dc <0f> 0b 48 8b 3b b9 80 84 1e 00 44 89 fa 44 89 f6 e8 af 9c fc ff 48
RSP: 0018:ffffb76d03167bc8 EFLAGS: 00010202
RAX: 00000000000000ff RBX: ffff9dc54f492800 RCX: 0000000000000008
RDX: 0000000000000000 RSI: 000000000001629b RDI: ffff9dc548bc0000
RBP: ffffb76d03167bf8 R08: 0000000000000009 R09: 0000000000000320
R10: 0000000000000191 R11: 0000000004451a00 R12: 0000000000030b48
R13: ffffffffc1717d90 R14: 0000000000000011 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff9dcc7e800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f79ba89f68 CR3: 0000000526810000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 dcn31_smu_transfer_wm_table_dram_2_smu+0x23/0x30 [amdgpu]
 dcn31_notify_wm_ranges+0x145/0x180 [amdgpu]
 dcn31_init_hw+0x465/0x8e0 [amdgpu]
 dc_set_power_state+0x113/0x160 [amdgpu]
 dm_resume+0x2b5/0x610 [amdgpu]
 amdgpu_device_ip_resume_phase2+0x58/0xc0 [amdgpu]
 amdgpu_device_resume+0xd9/0x210 [amdgpu]
 amdgpu_pmops_resume+0x1d/0x40 [amdgpu]
 pci_pm_resume+0x5b/0x90
 ? pci_pm_thaw+0x80/0x80
 dpm_run_callback+0x52/0x160
 device_resume+0xdd/0x1e0
 async_resume+0x1f/0x60
 async_run_entry_fn+0x33/0x120
 process_one_work+0x236/0x420
 worker_thread+0x34/0x410
 ? process_one_work+0x420/0x420
 kthread+0x12f/0x150
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30
 </TASK>
---[ end trace 71d329758d0428b2 ]---
------------[ cut here ]------------
WARNING: CPU: 8 PID: 60399 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn31/dcn31_smu.c:105 dcn31_smu_send_msg_with_param+0x56/0x130 [amdgpu]
Modules linked in: btrfs blake2b_generic xor zstd_compress raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c cpuid michael_mic hid_logitech_hidpp snd_usb_audio snd_usbmidi_lib hid_logitech_dj usbhid cdc_ether usbnet r8152 mii qrtr_mhi joydev intel_rapl_msr intel_rapl_common bnep snd_soc_dmic snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine edac_mce_amd amdgpu snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi nls_iso8859_1 kvm_amd kvm snd_hda_intel qrtr snd_intel_dspcfg ns snd_intel_sdw_acpi snd_hda_codec ath11k_pci crct10dif_pclmul input_leds snd_pci_acp6x ghash_clmulni_intel ath11k snd_hda_core snd_hwdep qmi_helpers aesni_intel snd_pcm uvcvideo crypto_simd videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 iommu_v2 cryptd platform_profile btusb snd_seq_midi gpu_sched rapl mac80211 serio_raw snd_seq_midi_event videobuf2_common sparse_keymap drm_ttm_helper btrtl snd_rawmidi btbcm videodev
 btintel ttm efi_pstore wmi_bmof hid_multitouch mc bluetooth snd_seq drm_kms_helper snd_seq_device ecdh_generic cfg80211 ecc snd_timer cec rc_core i2c_algo_bit snd fb_sys_fops syscopyarea sysfillrect mhi sysimgblt libarc4 soundcore snd_rn_pci_acp3x snd_pci_acp3x ccp ucsi_acpi typec_ucsi amd_pmc wireless_hotkey mac_hid typec acpi_tad sch_fq_codel msr parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic crc32_pclmul thunderbolt nvme i2c_piix4 xhci_pci xhci_pci_renesas nvme_core iosm wmi i2c_hid_acpi video i2c_hid hid
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:dcn31_smu_send_msg_with_param+0x56/0x130 [amdgpu]
Code: 48 8b 3b 4c 89 ea be 9b 62 01 00 e8 64 61 fc ff 85 c0 75 32 bf c6 a7 00 00 89 45 d4 e8 43 db 0c c6 41 83 ec 01 8b 45 d4 75 d6 <0f> 0b 85 c0 ba ff ff ff ff 75 16 48 83 c4 08 89 d0 5b 41 5c 41 5d
RSP: 0018:ffffb76d03167b80 EFLAGS: 00010202
RAX: 00000000000000ff RBX: ffff9dc54f492800 RCX: ffff9dc54dec03b0
RDX: 0000000000000000 RSI: 000000000001629b RDI: ffff9dc548bc0000
RBP: ffffb76d03167bb0 R08: ffffffffc1614a90 R09: ffffb76d03167bc8
R10: 0000000000000000 R11: ffffffff88875560 R12: 0000000000030d41
R13: ffffffffc1717d90 R14: 0000000000000012 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9dcc7e800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f79ba89f68 CR3: 0000000526810000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 ? amdgpu_cgs_write_register+0x14/0x20 [amdgpu]
 dcn31_smu_set_display_idle_optimization+0x2f/0x40 [amdgpu]
 dcn31_update_clocks+0x2f5/0x340 [amdgpu]
 ? amdgpu_device_rreg+0x17/0x20 [amdgpu]
 ? amdgpu_cgs_read_register+0x14/0x20 [amdgpu]
 ? dm_read_reg_func+0x2f/0x90 [amdgpu]
 dcn21_exit_optimized_pwr_state+0x1e/0x20 [amdgpu]
 clk_mgr_exit_optimized_pwr_state+0x91/0x120 [amdgpu]
 dc_link_detect+0x73/0xc0 [amdgpu]
 dm_resume+0x34c/0x610 [amdgpu]
 amdgpu_device_ip_resume_phase2+0x58/0xc0 [amdgpu]
 amdgpu_device_resume+0xd9/0x210 [amdgpu]
 amdgpu_pmops_resume+0x1d/0x40 [amdgpu]
 pci_pm_resume+0x5b/0x90
 ? pci_pm_thaw+0x80/0x80
 dpm_run_callback+0x52/0x160
 device_resume+0xdd/0x1e0
 async_resume+0x1f/0x60
 async_run_entry_fn+0x33/0x120
 process_one_work+0x236/0x420
 worker_thread+0x34/0x410
 ? process_one_work+0x420/0x420
 kthread+0x12f/0x150
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30
 </TASK>
---[ end trace 71d329758d0428b3 ]---

CVE References

Alex Hung (alexhung)
Changed in hwe-next:
assignee: nobody → Alex Hung (alexhung)
status: New → In Progress
tags: added: originate-from-1960665
Changed in hwe-next:
importance: Undecided → Medium
Alex Hung (alexhung)
description: updated
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.14 (Ubuntu):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Focal):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.14/5.14.0-1025.27 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Alex Hung (alexhung)
tags: added: verification-done-focal
removed: verification-needed-focal
Changed in hwe-next:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (31.6 KiB)

This bug was fixed in the package linux-oem-5.14 - 5.14.0-1027.30

---------------
linux-oem-5.14 (5.14.0-1027.30) focal; urgency=medium

  * CVE-2022-0001
    - x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
    - SAUCE: x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
    - SAUCE: x86/speculation: Add eIBRS + Retpoline options
    - SAUCE: Documentation/hw-vuln: Update spectre doc

linux-oem-5.14 (5.14.0-1025.27) focal; urgency=medium

  * focal/linux-oem-5.14: 5.14.0-1025.27 -proposed tracker (LP: #1961265)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.02.21)
    - [Config] Update config to match upstream stable release

  * Disable iwlwifi UHB (ultra high band) channels if we don't support wifi 6e
    currently (LP: #1961971)
    - SAUCE: iwlwifi: disable 6-7 GHz channels

  * Fix With 20.04d kernel and WX3200, unit freezes on resume (LP: #1961855)
    - SAUCE: drm/amd: Check if ASPM is enabled from PCIe subsystem

  * CVE-2022-25636
    - netfilter: nf_tables_offload: incorrect flow offload action array size

  * Focal update: upstream stable patchset 2022-02-22 (LP: #1961793)
    - PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
    - selftests: mptcp: fix ipv6 routing setup
    - net: ipa: use a bitmap for endpoint replenish_enabled
    - net: ipa: prevent concurrent replenish
    - drm/vc4: hdmi: Make sure the device is powered with CEC
    - net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
    - net/mlx5: Bridge, take rtnl lock in init error handler
    - net/mlx5: Bridge, ensure dev_name is null-terminated
    - net/mlx5e: Fix handling of wrong devices during bond netevent
    - net/mlx5: Use del_timer_sync in fw reset flow of halting poll
    - net/mlx5e: Fix module EEPROM query
    - net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
    - net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
    - net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion
    - net/mlx5: E-Switch, Fix uninitialized variable modact
    - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
    - i40e: Fix reset bw limit when DCB enabled with 1 TC
    - i40e: Fix reset path while removing the driver
    - net: amd-xgbe: ensure to reset the tx_timer_active flag
    - net: amd-xgbe: Fix skb data length underflow
    - fanotify: Fix stale file descriptor in copy_event_to_user()
    - net: sched: fix use-after-free in tc_new_tfilter()
    - rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
    - cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
    - af_packet: fix data-race in packet_setsockopt / packet_setsockopt
    - tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
    - Revert "drm/vc4: hdmi: Make sure the device is powered with CEC"
    - Revert "drm/vc4: hdmi: Make sure the device is powered with CEC" again
    - drm/i915: Disable DSB usage for now
    - selinux: fix double free of cond_list on error paths
    - audit: improve audit queue handling when "audit=1" on cmdline
    - ipc/sem: do not sleep with a spin lock held
    - spi: stm32-qs...

Changed in linux-oem-5.14 (Ubuntu Focal):
status: Fix Committed → Fix Released
You-Sheng Yang (vicamo)
Changed in hwe-next:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.