Ubuntu 22.04.1 CPU soft lockup occurs repeatedly

Bug #1989521 reported by Camille Rodriguez
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi all,

Ubuntu server 22.04.1 is having issues freezing repeatedly with CPU softlocking. The issue seems to have started in the last week, all packages are up to date. I've updated to hwe kernel, rebooted several times, and it still happens. Hw info: 32G RAM, AMD 3600x CPU, Quadro RTX 4000 GPU.

I caught the following in syslog :

Sep 13 04:17:55 marcus-server kernel: [33687.436241] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u64:17:154214]
Sep 13 04:17:55 marcus-server kernel: [33687.436243] Modules linked in: tls xt_nat veth nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink overlay bridge stp llc nvidia_drm(PO) snd_hda_codec_realtek intel_rapl_msr intel_rapl_common nvidia_modeset(PO) snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event zfs(PO) edac_mce_amd nls_iso8859_1 snd_rawmidi kvm_amd zunicode(PO) nvidia(PO) snd_seq kvm zzstd(O) zlua(O) zavl(PO) snd_seq_device icp(PO) rapl wmi_bmof snd_timer zcommon(PO) k10temp ccp ucsi_ccg znvpair(PO) snd typec_ucsi typec spl(O) soundcore apex(OE) gasket(OE) mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nct6775 hwmon_vid ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp
Sep 13 04:17:55 marcus-server kernel: [33687.436279] parport ramoops reed_solomon pstore_blk pstore_zone mtd efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nouveau mxm_wmi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm aesni_intel video crypto_simd igb cryptd xhci_pci ahci dca i2c_piix4 i2c_nvidia_gpu arcmsr libahci xhci_pci_renesas i2c_algo_bit wmi
Sep 13 04:17:55 marcus-server kernel: [33687.436302] CPU: 2 PID: 154214 Comm: kworker/u64:17 Tainted: P OE 5.15.0-47-generic #51-Ubuntu
Sep 13 04:17:55 marcus-server kernel: [33687.436303] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4, BIOS P4.20 08/02/2021
Sep 13 04:17:55 marcus-server kernel: [33687.436305] Workqueue: events_unbound async_run_entry_fn
Sep 13 04:17:55 marcus-server kernel: [33687.436308] RIP: 0010:arcmsr_wait_firmware_ready+0xc1/0x140 [arcmsr]
Sep 13 04:17:55 marcus-server kernel: [33687.436312] Code: e3 49 8b 94 24 48 08 00 00 b8 10 00 00 00 89 02 5b 41 5c 5d e9 b0 7b db e8 48 8b 47 50 4c 8d a0 bc 00 00 00 eb 0c 41 8b 04 24 <85> c0 0f 88 64 ff ff ff f6 83 81 00 00 00 01 75 eb bf 14 00 00 00
Sep 13 04:17:55 marcus-server kernel: [33687.436313] RSP: 0018:ffffade8d136fd10 EFLAGS: 00000202
Sep 13 04:17:55 marcus-server kernel: [33687.436314] RAX: 0000000000000000 RBX: ffff96720a460870 RCX: ffffade8c12b0034
Sep 13 04:17:55 marcus-server kernel: [33687.436315] RDX: 000000000000000d RSI: ffff96721b53ef80 RDI: ffff96720a460870
Sep 13 04:17:55 marcus-server kernel: [33687.436315] RBP: ffffade8d136fd20 R08: ffffffffffffffff R09: 0000000000000000
Sep 13 04:17:55 marcus-server kernel: [33687.436316] R10: 0000000000000284 R11: ffffffffffffffff R12: ffffade8c12b00bc
Sep 13 04:17:55 marcus-server kernel: [33687.436317] R13: 000000000000000d R14: ffff96720a460000 R15: ffff96720a460870
Sep 13 04:17:55 marcus-server kernel: [33687.436318] FS: 0000000000000000(0000) GS:ffff96791ea80000(0000) knlGS:0000000000000000
Sep 13 04:17:55 marcus-server kernel: [33687.436319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 13 04:17:55 marcus-server kernel: [33687.436320] CR2: 0000000000000000 CR3: 00000007c8c10000 CR4: 0000000000350ee0

It happens pretty often too, but the system isn't overloaded, so I'm not sure what is causing it.

Message from syslogd@marcus-server at Sep 14 02:16:11 ...
 kernel:[ 1276.914096] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [kworker/u64:28:252938]

Message from syslogd@marcus-server at Sep 14 02:16:11 ...
 kernel:[ 1304.913956] watchdog: BUG: soft lockup - CPU#8 stuck for 52s! [kworker/u64:28:252938]

Message from syslogd@marcus-server at Sep 14 02:37:47 ...
 kernel:[ 2569.382397] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [kworker/u64:27:743931]

Message from syslogd@marcus-server at Sep 14 02:37:47 ...
 kernel:[ 2597.382461] watchdog: BUG: soft lockup - CPU#3 stuck for 53s! [kworker/u64:27:743931]

I've also uploaded apport file to this bug. Please lmk if anything else is needed to troubleshoot this issue.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: lightdm 5077 F.... pulseaudio
 /dev/snd/controlC0: lightdm 5077 F.... pulseaudio
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2022-05-14 (127 days ago)
InstallationMedia: Ubuntu-Server 20.04.4 LTS "Focal Fossa" - Release amd64 (20220223.1)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
NonfreeKernelModules: nvidia_modeset zfs zunicode nvidia zavl icp zcommon znvpair
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-47-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro nomodeset
ProcVersionSignature: Ubuntu 5.15.0-47.51-generic 5.15.46
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-47-generic N/A
 linux-backports-modules-5.15.0-47-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.5
RfKill:

Tags: jammy uec-images
Uname: Linux 5.15.0-47-generic x86_64
UpgradeStatus: Upgraded to jammy on 2022-05-15 (126 days ago)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 08/02/2021
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P4.20
dmi.board.name: X570 Phantom Gaming 4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP4.20:bd08/02/2021:br5.17:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnX570PhantomGaming4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1989521/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Brian Murray (brian-murray) wrote :

If you run 'apport-collect 1989521' from the affected system the log files will be uploaded in a fashion which will be much more usable. Thanks!

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1989521

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : CRDA.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : IwConfig.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : Lspci.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : Lspci-vt.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : Lsusb.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : Lsusb-t.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : Lsusb-v.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : PaInfo.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : ProcModules.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : UdevDb.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : WifiSyslog.txt

apport information

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :
Download full text (5.5 KiB)

Taking a closer look at the timing behind the CPU soft lockup, turns out it happens every 21 minutes, varying between 21min33sec and 21min42sec for the times I've looked at.

root@marcus-server:/var/log# cat kern.log | grep watchdog: BUG: soft lockup -
(standard input):Sep 21 08:15:52 marcus-server kernel: [25828.061389] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [kworker/u64:47:2349337]
(standard input):Sep 21 08:15:52 marcus-server kernel: [25856.060650] watchdog: BUG: soft lockup - CPU#8 stuck for 52s! [kworker/u64:47:2349337]
(standard input):Sep 21 08:37:25 marcus-server kernel: [27118.374205] watchdog: BUG: soft lockup - CPU#10 stuck for 26s! [kworker/u64:110:2400870]
(standard input):Sep 21 08:37:25 marcus-server kernel: [27146.372183] watchdog: BUG: soft lockup - CPU#10 stuck for 53s! [kworker/u64:110:2400870]
(standard input):Sep 21 08:58:58 marcus-server kernel: [28408.518001] watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [kworker/u64:15:2451705]
(standard input):Sep 21 08:58:58 marcus-server kernel: [28436.518146] watchdog: BUG: soft lockup - CPU#7 stuck for 52s! [kworker/u64:15:2451705]
(standard input):Sep 21 09:20:32 marcus-server kernel: [29698.735150] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u64:42:2504150]
(standard input):Sep 21 09:20:32 marcus-server kernel: [29726.735261] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/u64:42:2504150]
(standard input):Sep 21 09:42:05 marcus-server kernel: [30988.922555] watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [kworker/u64:17:2571505]
(standard input):Sep 21 09:42:05 marcus-server kernel: [31016.924522] watchdog: BUG: soft lockup - CPU#6 stuck for 52s! [kworker/u64:17:2571505]
(standard input):Sep 21 10:03:38 marcus-server kernel: [32278.634545] watchdog: BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u64:107:2624962]
(standard input):Sep 21 10:03:38 marcus-server kernel: [32306.631958] watchdog: BUG: soft lockup - CPU#4 stuck for 52s! [kworker/u64:107:2624962]
(standard input):Sep 21 10:25:11 marcus-server kernel: [33568.843698] watchdog: BUG: soft lockup - CPU#11 stuck for 26s! [kworker/u64:50:2679696]
(standard input):Sep 21 10:25:11 marcus-server kernel: [33596.841576] watchdog: BUG: soft lockup - CPU#11 stuck for 52s! [kworker/u64:50:2679696]
(standard input):Sep 21 10:46:44 marcus-server kernel: [34859.443424] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u64:38:2735876]
(standard input):Sep 21 10:46:44 marcus-server kernel: [34887.443661] watchdog: BUG: soft lockup - CPU#2 stuck for 52s! [kworker/u64:38:2735876]
(standard input):Sep 21 11:08:17 marcus-server kernel: [36149.446509] watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [kworker/u64:102:2791856]
(standard input):Sep 21 11:08:17 marcus-server kernel: [36177.444332] watchdog: BUG: soft lockup - CPU#7 stuck for 52s! [kworker/u64:102:2791856]
(standard input):Sep 21 11:29:52 marcus-server kernel: [37438.207072] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u64:99:2846490]
(standard input):Sep 21 11:29:52 marcus-server kernel: [37466.204826] watchdog: BUG: soft lockup - CPU#0 stuck for 53s! [kworker/u64:99:2846490]
(standard input):Sep 21 11:29:52 mar...

Read more...

Revision history for this message
Marcus Yanello (marcusyan) wrote :

Following this: https://askubuntu.com/questions/1264859/watchdog-bug-soft-lockup-cpu6-stuck-for-23s
I attempted a BIOS update and verifying my swap was working properly. Also reseated my GPU and still no luck.

Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :

I troubleshooted some more, and it looks like I may have resolved the issue.
For an unknown reason, suspend/hibernation service was enabled on this server. It looked similar to this issue here https://bbs.archlinux.org/viewtopic.php?id=269203, where a race condition causes CPUs to freeze upon resuming from a suspend state. I disabled suspend services and I have not seen another occurrence of the issue yet.

$ sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
hybrid-sleep.target
Created symlink /etc/systemd/system/sleep.target → /dev/null.
Created symlink /etc/systemd/system/suspend.target → /dev/null.
Created symlink /etc/systemd/system/hibernate.target → /dev/null.
Created symlink /etc/systemd/system/hybrid-sleep.target → /dev/null.

Revision history for this message
Andrea Florio (andrea-opensuse-org) wrote :
Download full text (166.2 KiB)

i have the exact issue and i can confirm it started only a few days ago, likely after a kernel update.

here some logs

```
[ 1000.963980] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [swapper/1:0]
[ 1000.964023] Modules linked in: xt_conntrack xt_MASQUERADE nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common vmw_balloon input_leds joydev btusb serio_raw btrtl btbcm btintel bluetooth ecdh_generic ecc vmw_vmci mac_hid dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon overlay iptable_filter ip6table_filter ip6_tables efi_pstore br_netfilter pstore_blk pstore_zone bridge stp llc arp_tables ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd hid_generic cryptd vmwgfx psmouse usbhid hid ttm drm_kms_helper mptspi syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core ahci
[ 1000.964104] drm libahci i2c_piix4 mptscsih e1000 mptbase scsi_transport_spi pata_acpi
[ 1000.964113] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-48-generic #54-Ubuntu
[ 1000.964116] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 1000.964117] RIP: 0010:__do_softirq+0x6f/0x2e7
[ 1000.964127] Code: ff f7 ff ff 89 75 bc 65 81 05 29 fb 61 63 00 01 00 00 c7 45 d0 0a 00 00 00 65 66 c7 05 d8 02 63 63 00 00 fb 66 0f 1f 44 00 00 <b8> ff ff ff ff 49 c7 c2 c0 60 80 9d 41 0f bc c4 41 89 c5 4c 89 d3
[ 1000.964129] RSP: 0018:ffffaef1c003cf88 EFLAGS: 00000206
[ 1000.964131] RAX: ffff8ddf80373e80 RBX: 0000000000000000 RCX: 00000000000006e0
[ 1000.964133] RDX: 0000000000000281 RSI: 0000000004200042 RDI: ffff8ddf80b2ddc0
[ 1000.964134] RBP: ffffaef1c003cfd8 R08: ffff8de0b5e62d40 R09: 7fffffffffffffff
[ 1000.964135] R10: 000000e3088b80a8 R11: 00000000000017f5 R12: 0000000000000080
[ 1000.964136] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 1000.964138] FS: 0000000000000000(0000) GS:ffff8de0b5e40000(0000) knlGS:0000000000000000
[ 1000.964139] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1000.964140] CR2: 00005593135041e8 CR3: 000000010c1c2004 CR4: 0000000000370ee0
[ 1000.964143] Call Trace:
[ 1000.964145] <IRQ>
[ 1000.964150] irq_exit_rcu+0x94/0xc0
[ 1000.964155] sysvec_apic_timer_interrupt+0x80/0x90
[ 1000.964159] </IRQ>
[ 1000.964160] <TASK>
[ 1000.964161] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 1000.964164] RIP: 0010:tick_nohz_idle_exit+0x6c/0x150
[ 1000.964168] Code: 0f b6 5c 24 4c 83 e3 fe 41 89 dd 41 88 5c 24 4c d0 eb 41 c0 ed 02 83 e3 01 41 83 e5 01 75 18 84 db 75 14 fb 66 0f 1f 44 00 00 <5b> 41 5c 41 5d 41 5e 5d c3 cc cc cc cc e8 42 de fe ff 49 89 c6 45
[ 1000.964170] RSP: 0018:ffffaef1c00c7ec0 EFLAGS: 00000282
[ 1000.964171] RAX: 7ffffffffffffffd RBX: 0000000000000001 RCX: 00000000000006e0
[ 1000.964172] RDX: ffffffffffffffff RSI: 0000000000000083 RDI: 0000000000...

Revision history for this message
Marcus Yanello (marcusyan) wrote :

For me it started after installing gnome on my Ubuntu server. Suspicion is that the desktop sets up suspend mode, though since it is used as a server it causes issues for any server programs that need to not be suspended

Revision history for this message
Ryan King (ryhking) wrote :

I am also having the same issue as Andrea/Camille above. I have tried the work arounds above but have not had any luck. I see there hasn't been any updates to this so I wanted to say that I am having this issue as well.

Revision history for this message
Ben (raffish) wrote (last edit ):

Can confirm happening to my reasonably new installation on server 22.04.3.
Server isn't doing much, some dockers and a tp-link omada installation. Have null'd out the suspend options.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.