watchdog: BUG: soft lockup on Threadripper 2950X

Bug #1938722 reported by John Stultz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Description: Ubuntu 20.04.2 LTS
Release: 20.04

Been suddenly seeing a number of crashes today on my threadripper 2950x box today after the system being off over the weekend.

Suspect it may be tied to Ubuntu 5.4.0-80.90-generic 5.4.124 kernel, as I wasn't seeing it last week or previously.

Aug 2 16:52:14 threadripper kernel: [ 600.168436] watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [kworker/19:0:11301]
Aug 2 16:52:14 threadripper kernel: [ 600.168490] Modules linked in: veth xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf
_nat br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua snd_hda_codec_realtek snd_hda_codec_generic
ledtrig_audio snd_hda_codec_hdmi eeepc_wmi snd_hda_intel edac_mce_amd snd_intel_dspcfg asus_wmi ftdi_sio snd_hda_codec kvm_amd usbserial sparse_keymap snd_
hda_core kvm video wmi_bmof snd_hwdep snd_pcm snd_timer snd ccp soundcore k10temp mac_hid nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_log_ipv4
nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype sch_fq_codel xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table
_filter ip6_tables iptable_filter bpfilter ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid uas usb_storage amdgpu
Aug 2 16:52:14 threadripper kernel: [ 600.168542] amd_iommu_v2 gpu_sched crct10dif_pclmul ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper syscopyare
a aesni_intel crypto_simd mxm_wmi sysfillrect cryptd sysimgblt glue_helper fb_sys_fops igb drm dca i2c_piix4 ahci i2c_algo_bit libahci gpio_amdpt wmi gpio_
generic
Aug 2 16:52:14 threadripper kernel: [ 600.168558] CPU: 19 PID: 11301 Comm: kworker/19:0 Tainted: G L 5.4.0-80-generic #90-Ubuntu
Aug 2 16:52:14 threadripper kernel: [ 600.168559] Hardware name: System manufacturer System Product Name/ROG STRIX X399-E GAMING, BIOS 1203 10/09/2019
Aug 2 16:52:14 threadripper kernel: [ 600.168569] Workqueue: events free_work
Aug 2 16:52:14 threadripper kernel: [ 600.168574] RIP: 0010:smp_call_function_many+0x205/0x270
Aug 2 16:52:14 threadripper kernel: [ 600.168576] Code: e8 50 10 92 00 3b 05 ae cf 70 01 89 c7 0f 83 9b fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 80 99 64 a
1 8b 41 18 a8 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 89 cf 48 c7 c2 a0 b8 a4 a1 4c 89 fe
Aug 2 16:52:14 threadripper kernel: [ 600.168577] RSP: 0018:ffffb66b0aa17d00 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Aug 2 16:52:14 threadripper kernel: [ 600.168579] RAX: 0000000000000003 RBX: ffff8de1fd4ebd40 RCX: ffff8de1fd0b2540
Aug 2 16:52:14 threadripper kernel: [ 600.168580] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000002
Aug 2 16:52:14 threadripper kernel: [ 600.168580] RBP: ffffb66b0aa17d40 R08: ffff8de1f6da7190 R09: 0000000000000003
Aug 2 16:52:14 threadripper kernel: [ 600.168581] R10: ffff8de1f6da7190 R11: 0000000000000002 R12: ffffffffa0281930
Aug 2 16:52:14 threadripper kernel: [ 600.168581] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000080
Aug 2 16:52:14 threadripper kernel: [ 600.168583] FS: 0000000000000000(0000) GS:ffff8de1fd4c0000(0000) knlGS:0000000000000000
Aug 2 16:52:14 threadripper kernel: [ 600.168583] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 2 16:52:14 threadripper kernel: [ 600.168584] CR2: 000055ea29edefd0 CR3: 00000009c500a000 CR4: 00000000003406e0
Aug 2 16:52:14 threadripper kernel: [ 600.168585] Call Trace:
Aug 2 16:52:14 threadripper kernel: [ 600.168592] ? load_new_mm_cr3+0xf0/0xf0
Aug 2 16:52:14 threadripper kernel: [ 600.168594] on_each_cpu+0x2d/0x60
Aug 2 16:52:14 threadripper kernel: [ 600.168596] flush_tlb_kernel_range+0x38/0x90
Aug 2 16:52:14 threadripper kernel: [ 600.168597] __purge_vmap_area_lazy+0x70/0x6d0
Aug 2 16:52:14 threadripper kernel: [ 600.168598] free_vmap_area_noflush+0xe1/0xf0
Aug 2 16:52:14 threadripper kernel: [ 600.168600] remove_vm_area+0x9a/0xb0
Aug 2 16:52:14 threadripper kernel: [ 600.168602] __vunmap+0x5f/0x210
Aug 2 16:52:14 threadripper kernel: [ 600.168603] free_work+0x25/0x30
Aug 2 16:52:14 threadripper kernel: [ 600.168607] process_one_work+0x1eb/0x3b0
Aug 2 16:52:14 threadripper kernel: [ 600.168609] worker_thread+0x4d/0x400
Aug 2 16:52:14 threadripper kernel: [ 600.168611] kthread+0x104/0x140
Aug 2 16:52:14 threadripper kernel: [ 600.168612] ? process_one_work+0x3b0/0x3b0
Aug 2 16:52:14 threadripper kernel: [ 600.168613] ? kthread_park+0x90/0x90
Aug 2 16:52:14 threadripper kernel: [ 600.168617] ret_from_fork+0x22/0x40
Aug 2 16:52:40 threadripper kernel: [ 606.280524] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 2 16:52:40 threadripper kernel: [ 606.280567] rcu: 2-...0: (1 GPs behind) idle=ae6/1/0x4000000000000000 softirq=26910/26911 fqs=7179
Aug 2 16:52:40 threadripper kernel: [ 606.280609] rcu: 18-...0: (1 GPs behind) idle=c8e/1/0x4000000000000000 softirq=28056/28057 fqs=7179
Aug 2 16:52:40 threadripper kernel: [ 606.280659] (detected by 24, t=15002 jiffies, g=39017, q=5149545)
Aug 2 16:52:40 threadripper kernel: [ 606.280661] Sending NMI from CPU 24 to CPUs 2:
Aug 2 16:52:40 threadripper kernel: [ 616.204803] Sending NMI from CPU 24 to CPUs 18:
Aug 2 16:52:40 threadripper kernel: [ 626.131497] rcu: rcu_sched kthread starved for 4960 jiffies! g39017 f0x2 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=7
Aug 2 16:52:40 threadripper kernel: [ 626.131554] rcu: RCU grace-period kthread stack dump:
Aug 2 16:52:40 threadripper kernel: [ 626.131577] rcu_sched R running task 0 11 2 0x80004000
Aug 2 16:52:40 threadripper kernel: [ 626.131580] Call Trace:
Aug 2 16:52:40 threadripper kernel: [ 626.131589] __schedule+0x2e3/0x740
Aug 2 16:52:40 threadripper kernel: [ 626.131592] preempt_schedule_common+0x18/0x30
Aug 2 16:52:40 threadripper kernel: [ 626.131594] _cond_resched+0x22/0x30
Aug 2 16:52:40 threadripper kernel: [ 626.131597] force_qs_rnp+0xa8/0x170
Aug 2 16:52:40 threadripper kernel: [ 626.131598] ? synchronize_sched_expedited_wait+0x180/0x180
Aug 2 16:52:40 threadripper kernel: [ 626.131600] rcu_gp_kthread+0x5e8/0x990
Aug 2 16:52:40 threadripper kernel: [ 626.131604] kthread+0x104/0x140
Aug 2 16:52:40 threadripper kernel: [ 626.131605] ? kfree_call_rcu+0x20/0x20
Aug 2 16:52:40 threadripper kernel: [ 626.131607] ? kthread_park+0x90/0x90
Aug 2 16:52:40 threadripper kernel: [ 626.131608] ret_from_fork+0x22/0x40

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-80-generic 5.4.0-80.90
ProcVersionSignature: Ubuntu 5.4.0-80.90-generic 5.4.124
Uname: Linux 5.4.0-80-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-80-generic.
ApportVersion: 2.20.11-0ubuntu27.18
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D7p', '/dev/snd/pcmC1D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Card0.Amixer.info:
 Card hw:0 'Generic'/'HD-Audio Generic at 0xba600000 irq 96'
   Mixer name : 'Realtek ALC1220'
   Components : 'HDA:10ec1168,10438723,00100003'
   Controls : 46
   Simple ctrls : 20
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0x9f860000 irq 98'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100700'
   Controls : 14
   Simple ctrls : 2
CasperMD5CheckResult: skip
Date: Mon Aug 2 19:09:24 2021
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: System manufacturer System Product Name
ProcEnviron:
 TERM=screen.xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic root=UUID=04417339-7685-11e9-bdb0-049226da3a81 ro pci=nommconf consoleblank=60
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-80-generic N/A
 linux-backports-modules-5.4.0-80-generic N/A
 linux-firmware 1.187.15
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2021-01-23 (191 days ago)
dmi.bios.date: 10/09/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1203
dmi.board.asset.tag: Default string
dmi.board.name: ROG STRIX X399-E GAMING
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1203:bd10/09/2019:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX399-EGAMING:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
John Stultz (jstultz) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
John Stultz (jstultz) wrote :
Download full text (3.4 KiB)

Tripped this again today w/ 5.4.0-86-generic:

[179417.505068] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:691464]
[179417.505110] Modules linked in: xt_multiport cpuid veth xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua edac_mce_amd ftdi_sio eeepc_wmi kvm_amd usbserial asus_wmi sparse_keymap snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi video snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec snd_hda_core snd_hwdep snd_pcm k10temp snd_timer snd soundcore ccp mac_hid nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype sch_fq_codel xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter msr ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic uas usbhid hid
[179417.505142] usb_storage amdgpu crct10dif_pclmul crc32_pclmul ghash_clmulni_intel amd_iommu_v2 gpu_sched aesni_intel ttm crypto_simd drm_kms_helper cryptd syscopyarea glue_helper sysfillrect sysimgblt fb_sys_fops igb mxm_wmi drm dca i2c_algo_bit ahci libahci i2c_piix4 wmi gpio_amdpt gpio_generic
[179417.505153] CPU: 2 PID: 691464 Comm: kworker/2:1 Not tainted 5.4.0-86-generic #97-Ubuntu
[179417.505154] Hardware name: System manufacturer System Product Name/ROG STRIX X399-E GAMING, BIOS 1203 10/09/2019
[179417.505160] Workqueue: events free_work
[179417.505164] RIP: 0010:smp_call_function_many+0x208/0x270
[179417.505166] Code: 92 00 3b 05 8e c9 70 01 89 c7 0f 83 9b fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 80 99 84 af 8b 41 18 a8 01 74 0a f3 90 8b 51 18 <83> e2 01 75 f6 eb c8 89 cf 48 c7 c2 e0 b8 c4 af 4c 89 fe e8 00 1d
[179417.505166] RSP: 0018:ffffa88c84877d00 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[179417.505167] RAX: 0000000000000003 RBX: ffff97e83d0abd40 RCX: ffff97e83d072080
[179417.505168] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000001
[179417.505168] RBP: ffffa88c84877d40 R08: ffff97e836da7c40 R09: 0000000000000003
[179417.505169] R10: ffff97e836da7c40 R11: 0000000000000002 R12: ffffffffae481930
[179417.505169] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000080
[179417.505170] FS: 0000000000000000(0000) GS:ffff97e83d080000(0000) knlGS:0000000000000000
[179417.505170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[179417.505171] CR2: 00007fde360abfb0 CR3: 00000003f880a000 CR4: 00000000003406e0
[179417.505171] Call Trace:
[179417.505176] ? load_new_mm_cr3+0xf0/0xf0
[179417.505177] on_each_cpu+0x2d/0x60
[179417.505178] flush_tlb_kernel_range+0x38/0x90
[179417.505179] __purge_vmap_area_lazy+0x70/0x6d0
[179417.505180] free_vmap_area_noflush+0xe1/0xf0
[179417.505180] remove_vm_area+0x9a/0xb0
[179417.505181] __vunmap+0x5f/0x210
[179417.505182] free_work+0x25/0x30
[179417.505184] process_one_work+0x1eb/0x3b0
[179417.505185] worker_thread+0x4d/0x400
[179417.505186] kthrea...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.