ath11k: Freezing kernel when doing s2idle [17cb:1103]

Bug #1991036 reported by Bin Li
This bug report is a duplicate of:  Bug #1995041: Fix ath11k deadlock on WCN6855. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OEM Priority Project
New
Undecided
Unassigned
linux (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

On ubuntu 22.04, I installed v6.0-rc3 mainline kernel.
linux-firmware 20220329.git681281e4-0ubuntu3.4

Sep 01 13:25:13 u-ThinkPad-P16-Gen-1 kernel: PM: suspend entry (s2idle)
Sep 01 13:25:13 u-ThinkPad-P16-Gen-1 kernel: Filesystems sync: 0.004 seconds
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: Freezing user space processes ...
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: Freezing of tasks failed after 20.007 seconds (9 tasks refusing to freeze, wq_busy=0):
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: task:NetworkManager state:D stack: 0 pid: 988 ppid: 1 flags:0x00004006
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: Call Trace:
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: <TASK>
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: __schedule+0x221/0x5c0
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: schedule+0x5f/0x100
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: schedule_timeout+0x111/0x150
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: wait_for_completion+0x88/0x140
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: __flush_work.isra.0+0x1b9/0x340
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ? flush_workqueue_prep_pwqs+0x140/0x140
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: __cancel_work_timer+0x10d/0x190
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ? ath11k_mac_config_mon_status_default+0x9c/0x170 [ath11k]
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: cancel_work_sync+0x10/0x20
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ath11k_mac_op_stop+0x9f/0x1e0 [ath11k]
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: drv_stop+0x45/0x120 [mac80211]
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ieee80211_stop_device+0x43/0x50 [mac80211]
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ieee80211_do_stop+0x6b1/0x980 [mac80211]
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ? cond_synchronize_rcu_expedited+0x40/0x40
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ? qdisc_reset+0x27/0x150
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: ieee80211_stop+0x43/0x170 [mac80211]
Sep 01 13:25:33 u-ThinkPad-P16-Gen-1 kernel: __dev_close_many+0x9f/0x120

Revision history for this message
Bin Li (binli) wrote (last edit ):

I built a 5.17-oem kernel[1] with patch[2] and the hang issue is gone when doing suspend, the rtnl acquire failed, no deadlock issue any more.

Sep 02 13:43:06 u-ThinkPad-P16-Gen-1 kernel: ath11k_pci 0000:09:00.0: rtnl acquire failed
Sep 02 13:43:06 u-ThinkPad-P16-Gen-1 kernel: ath11k_pci 0000:09:00.0: failed to perform regd update : -16

But the patch was rejected by reviewer as below comment. The reviewer previous patch is not a security fix. So Qualcomm Engineer submitted another patch to internal review.

[1] https://people.canonical.com/~binli/5.17.0-oem/
[2] https://patchwork.kernel.org<email address hidden>/

tags: added: oem-priority originate-from-1983094 sutton
Revision history for this message
Bin Li (binli) wrote :

I also reported a bug in upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=216434

tags: added: originate-from-1982536
Bin Li (binli)
tags: added: originate-from-1981178
tags: added: originate-from-1981174
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1991036

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Julian Andres Klode (juliank) wrote :

Dear kernel bot, this bug has enough info, and there's a patch upstream that's part of 6.1 that fixes the issue.

Though I can say I now hit a failure to restart the device / load firmware after a 2nd resume.

This is likely another race; if you just suspend by echo mem > /sys/power/state in a loop, it works multiple times pretty reliable.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Julian Andres Klode (juliank) wrote :

I think kernel team needs patch sent to ML, but I don't do a lot of kernel stuff. FWIW, here's the log from 6.1-rc5 which includes the fix:

Nov 29 20:03:28 jak-t14-g3 kernel: PM: suspend exit
Nov 29 20:03:28 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: BAR 0: assigned [mem 0x80000000-0x801fffff 64bit]
Nov 29 20:03:28 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: MSI vectors: 32
Nov 29 20:03:28 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: wcn6855 hw2.1
Nov 29 20:03:29 jak-t14-g3 kernel: mhi mhi0: Requested to power ON
Nov 29 20:03:29 jak-t14-g3 kernel: mhi mhi0: Power on setup success
Nov 29 20:03:29 jak-t14-g3 kernel: mhi mhi0: Wait for device to enter SBL or Mission mode
Nov 29 20:03:29 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
Nov 29 20:03:29 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: fw_version 0x110c0c35 fw_build_timestamp 2022-06-24 10:50 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
Nov 29 20:03:29 jak-t14-g3 kernel: Generic FE-GE Realtek PHY r8169-0-100:00: attached PHY driver (mii_bus:phy_addr=r8169-0-100:00, irq=MAC)
Nov 29 20:03:30 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: ignore reset dev flags 0x8000
Nov 29 20:03:40 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: failed to wait wlan mode request (mode 0): -110
Nov 29 20:03:40 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: qmi failed to send wlan fw mode: -110
Nov 29 20:03:40 jak-t14-g3 kernel: ath11k_pci 0000:02:00.0: failed to send firmware start: -110

(You can see I try the unpublished firmware from the ath11-firmware repo right now and it doesn't help either).

(but anything is better than hanging netlink and requiring you to reboot via sysrq ;))

Revision history for this message
Bin Li (binli) wrote :

In ubuntu oem kernel, this patch will be available in next release.

https://patchwork.ozlabs<email address hidden>/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.