ath9k: ath9k_txq_has_key regularly produces soft lockups

Bug #1979571 reported by Markus Grimm
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

We first saw this bug using the Kernel 5.4.0-89 shipped provided by Ubuntu. 5.4.0-88 still works fine. I'm not a kernel expert but https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/wireless/ath/ath9k/main.c?id=ca2848022c12789685d3fab3227df02b863f9696 introduces this new function (and is new in Ubuntu's 5.4.0-89) which seems to hang here and it is untouched since then. In particular the while loop in ath9k_txq_has_key looks fishy to me.

The bug is hard to reproduce under lab conditions but happens regularly on our mobile robots. We suspect that it's related to roaming. The log below is one of the first occurrences of this bug from October last year:

Oct 21 09:51:53 toru-0071 kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [wpa_supplicant:484]
Oct 21 09:51:53 toru-0071 kernel: Modules linked in: can_raw can xsk_diag af_packet_diag netlink_diag tcp_diag udp_diag raw_diag inet_diag unix_diag nft_ct nf_tables_set ccm ath9k ath9k_common ath9k_hw ath inte>
Oct 21 09:51:53 toru-0071 kernel: glue_helper igb e1000e drm ahci dca i2c_i801 i2c_algo_bit libahci video
Oct 21 09:51:53 toru-0071 kernel: CPU: 7 PID: 484 Comm: wpa_supplicant Tainted: G OEL 5.4.0-89-generic #100-Ubuntu
Oct 21 09:51:53 toru-0071 kernel: Hardware name: Default string Default string/SKYBAY, BIOS 5.12 03/28/2017
Oct 21 09:51:53 toru-0071 kernel: RIP: 0010:ath9k_txq_has_key+0x1b4/0x200 [ath9k]
Oct 21 09:51:53 toru-0071 kernel: Code: 8d 84 10 22 01 00 00 48 c1 e0 04 49 8b 44 05 10 48 39 c6 74 26 0f b6 58 53 84 db 75 16 48 8b 48 20 48 85 c9 74 0d 0f b6 49 4b <41> 39 c9 0f 84 6e ff ff ff 48 8b 00 48 39 >
Oct 21 09:51:53 toru-0071 kernel: RSP: 0018:ffffb1dec0693628 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
Oct 21 09:51:53 toru-0071 kernel: RAX: ffff96524e14ac98 RBX: 0000000000000000 RCX: 00000000000000ff
Oct 21 09:51:53 toru-0071 kernel: RDX: 0000000000000001 RSI: ffff96524e7e3320 RDI: ffff96524e7e3310
Oct 21 09:51:53 toru-0071 kernel: RBP: ffffb1dec0693670 R08: ffff965241f15630 R09: 0000000000000004
Oct 21 09:51:53 toru-0071 kernel: R10: 0000000000000027 R11: 0000000000000000 R12: 0000000000000003
Oct 21 09:51:53 toru-0071 kernel: R13: ffff96524e7e1e80 R14: 000000000000014a R15: ffff96524e7e3300
Oct 21 09:51:53 toru-0071 kernel: FS: 00007f5be4a76140(0000) GS:ffff96525dbc0000(0000) knlGS:0000000000000000
Oct 21 09:51:53 toru-0071 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 21 09:51:53 toru-0071 kernel: CR2: 00007f814ca21028 CR3: 000000080ce6a006 CR4: 00000000003606e0
Oct 21 09:51:53 toru-0071 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 21 09:51:53 toru-0071 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 21 09:51:53 toru-0071 kernel: Call Trace:
Oct 21 09:51:53 toru-0071 kernel: ath9k_set_key+0xf5/0x290 [ath9k]
Oct 21 09:51:53 toru-0071 kernel: ieee80211_key_replace+0x370/0x870 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: ieee80211_free_sta_keys+0xb3/0xf0 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: __sta_info_destroy_part2+0x3a/0x190 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: __sta_info_flush+0x128/0x180 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: ieee80211_set_disassoc+0xc0/0x5f0 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: ieee80211_mgd_auth+0x15b/0x3d0 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: ieee80211_auth+0x18/0x20 [mac80211]
Oct 21 09:51:53 toru-0071 kernel: cfg80211_mlme_auth+0x104/0x210 [cfg80211]
Oct 21 09:51:53 toru-0071 kernel: nl80211_authenticate+0x284/0x2e0 [cfg80211]
Oct 21 09:51:53 toru-0071 kernel: genl_family_rcv_msg+0x1b9/0x470
Oct 21 09:51:53 toru-0071 kernel: ? __netlink_sendskb+0x42/0x50
Oct 21 09:51:53 toru-0071 kernel: genl_rcv_msg+0x4c/0xa0
Oct 21 09:51:53 toru-0071 kernel: ? _cond_resched+0x19/0x30
Oct 21 09:51:53 toru-0071 kernel: ? genl_family_rcv_msg+0x470/0x470
Oct 21 09:51:53 toru-0071 kernel: netlink_rcv_skb+0x50/0x120
Oct 21 09:51:53 toru-0071 kernel: genl_rcv+0x29/0x40
Oct 21 09:51:53 toru-0071 kernel: netlink_unicast+0x187/0x220
Oct 21 09:51:53 toru-0071 kernel: netlink_sendmsg+0x222/0x3e0
Oct 21 09:51:53 toru-0071 kernel: sock_sendmsg+0x65/0x70
Oct 21 09:51:53 toru-0071 kernel: ____sys_sendmsg+0x212/0x280
Oct 21 09:51:53 toru-0071 kernel: ___sys_sendmsg+0x88/0xd0
Oct 21 09:51:53 toru-0071 kernel: ? sock_sendmsg+0x65/0x70
Oct 21 09:51:53 toru-0071 kernel: ? sock_write_iter+0x93/0xf0
Oct 21 09:51:53 toru-0071 kernel: ? new_sync_write+0x125/0x1c0
Oct 21 09:51:53 toru-0071 kernel: ? __cgroup_bpf_run_filter_setsockopt+0xae/0x2d0
Oct 21 09:51:53 toru-0071 kernel: ? _cond_resched+0x19/0x30
Oct 21 09:51:53 toru-0071 kernel: ? aa_sk_perm+0x43/0x170
Oct 21 09:51:53 toru-0071 kernel: __sys_sendmsg+0x5c/0xa0
Oct 21 09:51:53 toru-0071 kernel: __x64_sys_sendmsg+0x1f/0x30
Oct 21 09:51:53 toru-0071 kernel: do_syscall_64+0x57/0x190
Oct 21 09:51:53 toru-0071 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 21 09:51:53 toru-0071 kernel: RIP: 0033:0x7f5be4e06747
Oct 21 09:51:53 toru-0071 kernel: Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 >
Oct 21 09:51:53 toru-0071 kernel: RSP: 002b:00007fff90564598 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
Oct 21 09:51:53 toru-0071 kernel: RAX: ffffffffffffffda RBX: 0000558020f4b440 RCX: 00007f5be4e06747
Oct 21 09:51:53 toru-0071 kernel: RDX: 0000000000000000 RSI: 00007fff905645d0 RDI: 0000000000000004
Oct 21 09:51:53 toru-0071 kernel: RBP: 0000558020f52830 R08: 0000000000000004 R09: 00007f5be4eceb80
Oct 21 09:51:53 toru-0071 kernel: R10: 00007fff905646a4 R11: 0000000000000246 R12: 0000558020f4b350
Oct 21 09:51:53 toru-0071 kernel: R13: 00007fff905645d0 R14: 00007fff905646a4 R15: 0000558020f53440

Description: Ubuntu 20.04.4 LTS
Release: 20.04

Revision history for this message
Markus Grimm (grimm-5) wrote :
Revision history for this message
Markus Grimm (grimm-5) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1979571

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Markus Grimm (grimm-5) wrote :

I unfortunately can't install apport on the robots. Due to the dependencies it would break our system:
Recommended packages:
  apport-symptoms
The following packages will be REMOVED:
  systemd-coredump toru-5.3 toru-system # <---- this would uninstall everything that we need
The following NEW packages will be installed:
  apport python3-apport python3-gi python3-httplib2 python3-keyring
  python3-launchpadlib python3-lazr.restfulclient python3-lazr.uri
  python3-problem-report python3-requests-unixsocket python3-secretstorage
  python3-simplejson python3-wadllib
WARNING: The following essential packages will be removed.

I'm happy to provide all additionally required log files

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.