system freeze (cpu fan goes high)

Bug #2020656 reported by __JEAN_FRANCOIS__
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-signed-hwe-5.19 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

More than once a day the system totally freeze.

CPU fan then goes higher and higher.

Does not seem to be related to system load as yesterday it happend during lunch break.

Today's journalctl -b -1 last message is
mai 24 11:50:06 ThinkPad-P14s kernel: watchdog: BUG: soft lockup - CPU#9 stuck for 167s! [opt ax7ixzwwoey:158586]

There are many more of the same kind just before

$ lsb_release -rd
Description: Ubuntu 22.04.2 LTS
Release: 22.04

$ apt-cache policy linux-image-5.19.0-42-generic
linux-image-5.19.0-42-generic:
  Installé : 5.19.0-42.43~22.04.1
  Candidat : 5.19.0-42.43~22.04.1
 Table de version :
 *** 5.19.0-42.43~22.04.1 500
        500 http://fr.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status

Note this problem was present before last kernel updates

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.19.0-42-generic 5.19.0-42.43~22.04.1
ProcVersionSignature: Ubuntu 5.19.0-42.43~22.04.1-generic 5.19.17
Uname: Linux 5.19.0-42-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.4
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Wed May 24 12:00:08 2023
InstallationDate: Installed on 2023-03-25 (59 days ago)
InstallationMedia: Ubuntu 22.04.2 LTS "Jammy Jellyfish" - Release amd64 (20230223)
ProcEnviron:
 TERM=alacritty
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-hwe-5.19
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :
Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

Digging the Internet I reached some posts/bugs that talked about a problem with some amd cpu that hard time getting out of C6 power state.

So I applied one of the suggested workaround that is: prevent cpu to enter C6 state so it won't have problem to exit ^^

For this purpose is used cpupower, more precisely I used this command: cpupower idle-set --disable-by-latency 350

before I ran this command, cpupower moniter résulted in:
    | Mperf || Idle_Stats
 CPU| C0 | Cx | Freq || POLL | C1 | C2 | C3
   0| 0,07| 99,93| 1429|| 0,00| 0,00| 0,00| 99,86
nearly the same for other cores

I guess it means that all cores can sleep down as much as possible…

after the command, cpupower moniter résulted in:
    | Mperf || Idle_Stats
 CPU| C0 | Cx | Freq || POLL | C1 | C2 | C3
   0| 0,07| 99,93| 1429|| 0,00| 0,00| 99,91| 0,00
nearly the same for other cores

It seems cores can't sleep as much as before.

I've not really understood what the C6 state is and what the mapping with those Cx shown by cpupower is (max for cpupower is C4).
So far, cpupower idle-set --disable-by-latency 350 seems to do a good job preventing the cpu to go to far in sleeping state.

Nearly one full day without a freeze makes my computer usable.

Some questions please:
- any help to understand the C6/Cx mapping?
- is it a software problem (bug in the kernel like I'm guessing and I can hope for a fix) or a hardware problem and I should send the computer back to the manufacturer?
- I'm on laptop, how to evaluate the impact on the battery usage?

Many thanks

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

Sadly I managed to freeze it again.
I was testing some ssh parameters,
just did ssh -C -X onto the machine then launched a vscode and moved the mouse quickly over the folder list.
No "watchdog: BUG: soft lockup" seen in journalctl

A reddit post mentions that with a lenove T14s (very similar to my P14s) disabling Wayland did the trick.
I'm gonna try this.

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :
Download full text (7.3 KiB)

Go the message below, my vscode window (through ssh) is freezed but the system is still running

# uptime -p
up 6 hours, 23 minutes

extract from journalctl -f

```
mai 25 18:51:32 jf-ThinkPad-P14s kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
mai 25 18:51:32 jf-ThinkPad-P14s kernel: #PF: supervisor read access in kernel mode
mai 25 18:51:32 jf-ThinkPad-P14s kernel: #PF: error_code(0x0000) - not-present page
mai 25 18:51:32 jf-ThinkPad-P14s kernel: PGD 0 P4D 0
mai 25 18:51:32 jf-ThinkPad-P14s kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
mai 25 18:51:32 jf-ThinkPad-P14s kernel: CPU: 5 PID: 24230 Comm: ThreadPoolForeg Not tainted 5.19.0-42-generic #43~22.04.1-Ubuntu
mai 25 18:51:32 jf-ThinkPad-P14s kernel: Hardware name: LENOVO 21J5CTO1WW/21J5CTO1WW, BIOS R23ET65W (1.35 ) 03/21/2023
mai 25 18:51:32 jf-ThinkPad-P14s kernel: RIP: 0010:syscall_exit_to_user_mode+0x0/0x50
mai 25 18:51:32 jf-ThinkPad-P14s kernel: Code: 1f 44 00 00 5d 31 c0 89 c2 89 c1 89 c6 89 c7 c3 cc cc cc cc 66 0f 1f 44 00 00 eb 07 0f 00 2d c7 06 51 00 c3 cc cc cc cc 66 90 <55> 65 48 8b 04 25 c0 fb 01 00 48 89 e5 41 54 48 8b 70 08 49 89 fc
mai 25 18:51:32 jf-ThinkPad-P14s kernel: RSP: 0018:ffffa8e40b0a3e78 EFLAGS: 00010246
mai 25 18:51:32 jf-ThinkPad-P14s kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
mai 25 18:51:32 jf-ThinkPad-P14s kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa8e40b0a3f58
mai 25 18:51:32 jf-ThinkPad-P14s kernel: RBP: ffffa8e40b0a3f48 R08: 0000000000000000 R09: 0000000000000000
mai 25 18:51:32 jf-ThinkPad-P14s kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa8e40b0a3f58
mai 25 18:51:32 jf-ThinkPad-P14s kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
mai 25 18:51:32 jf-ThinkPad-P14s kernel: FS: 00007ff53f1bc640(0000) GS:ffff92c761f40000(0000) knlGS:0000000000000000
mai 25 18:51:32 jf-ThinkPad-P14s kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
mai 25 18:51:32 jf-ThinkPad-P14s kernel: CR2: 0000000000000008 CR3: 000000011c2b8000 CR4: 0000000000750ee0
mai 25 18:51:32 jf-ThinkPad-P14s kernel: PKRU: 55555558
mai 25 18:51:32 jf-ThinkPad-P14s kernel: Call Trace:
mai 25 18:51:32 jf-ThinkPad-P14s kernel: <TASK>
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? do_syscall_64+0x69/0x90
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? __x64_sys_futex+0x78/0x1f0
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? __secure_computing+0x9b/0x110
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? syscall_trace_enter.constprop.0+0xb5/0x1c0
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? exit_to_user_mode_prepare+0x3b/0xd0
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? syscall_exit_to_user_mode+0x2a/0x50
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? do_syscall_64+0x69/0x90
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? irqentry_exit+0x43/0x50
mai 25 18:51:32 jf-ThinkPad-P14s kernel: ? exc_page_fault+0x92/0x1b0
mai 25 18:51:32 jf-ThinkPad-P14s kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
mai 25 18:51:32 jf-ThinkPad-P14s kernel: RIP: 0033:0x7ff58b6913b7
mai 25 18:51:32 jf-ThinkPad-P14s kernel: Code: 81 00 00 00 b8 ca 00 00 00 0f 05 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa 40 80 f6 81 45 31 d2 ba 01 00 00...

Read more...

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

new freeze after about 18h of uptime

last messages from journalctl -f (around 2 to 3 min before the freeze)

mai 26 10:38:46 jf-ThinkPad-P14s kernel: perf: interrupt took too long (3199 > 3165), lowering kernel.perf_event_max_sample_rate to 62500
mai 26 10:39:30 jf-ThinkPad-P14s gnome-keyring-daemon[15315]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered
mai 26 10:39:30 jf-ThinkPad-P14s gnome-keyring-d[15315]: asked to register item /org/freedesktop/secrets/collection/login/2, but it's already registered

apt dist-upgrade done this morning

system load was low, reading a file in vscode through ssh…

I don't really know what to do, I wish I can be sure it's not a hardware problem?

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

I decided to upgrade to ubuntu 23.03 to see if a newer version of the kernel change something

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

New freeze
No particular message in journalctl
The fan did not went high this time

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

modifying the kernel command line this way (adding amdgpu.dcdebugmask=0x10)
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.dcdebugmask=0x10"
seems to give huge improvement
$> uptime -p
up 1 day, 14 hours, 31 minutes

NEW RECORD

Revision history for this message
Matias Alvarez Sabate (matialvarezs1) wrote :

Hi Jean Francois, I'm having the same screen freeze problem. On Thursday 07-06-2023 I updated from 21.10 to 22.04 with kernel 5.19 and it started to happen. I have downloaded to the one I used before, 5.15 and everything seems to be fine

With 5.15 I didn't have that bug

Tell me, have you been able to solve it?

Regard
Matias

Revision history for this message
__JEAN_FRANCOIS__ (jean-francois--) wrote :

Hello

I've created an issue on lenovo side, you can track it here : https://forums.lenovo.com/t5/Ubuntu/System-freeze-with-Ubuntu-22-04-to-23-04-on-P14s-gen3-AMD-version/m-p/5227346

To summarize, the only stable configuration I can get is to install lenovo oem kernel on ubuntu 20.04, the latest being 5.14.0-1059-oem.
I've tried many kernel, even the 6.4.2 using mainline and none are ok.

Hope this helps (and hope this bug will receive more attention, maybe I have not filled it in the correct place?)

JF

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-hwe-5.19 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.