4.15.0-151 is freezing various CPUs

Bug #1938013 reported by Juerg Haefliger
170
This bug affects 25 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Bionic
High
Stefan Bader

Bug Description

From: https://askubuntu.com/questions/1353859/ubuntu-18-04-05-lts-desktop-hangs-with-since-kernel-4-15-0-151-and-systemd-237-3

Several crashes in /var/crash, here's the last one:-

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Fri Jul 23 18:10:54 2021
Failure: oops
OopsText:
 BUG: Bad rss-counter state mm:00000000c098a229 idx:2 val:-1
 usblp0: removed
 usblp 1-5:1.0: usblp0: USB Bidirectional printer dev 3 if 0 alt 0 proto 2 vid 0x04F9 pid 0x02EC
 <44>[ 18.329026] systemd-journald[358]: File /var/log/journal/b022dca21fd4480baeeb84f47ab439d3/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
 vboxdrv: loading out-of-tree module taints kernel.
 vboxdrv: module verification failed: signature and/or required key missing - tainting kernel
 vboxdrv: Found 8 processor cores
 vboxdrv: TSC mode is Invariant, tentative frequency 2303999142 Hz
 vboxdrv: Successfully loaded version 6.1.24 r145767 (interface 0x00300000)
 VBoxNetFlt: Successfully started.
 VBoxNetAdp: Successfully started.
 Bluetooth: RFCOMM TTY layer initialized
 Bluetooth: RFCOMM socket layer initialized
 Bluetooth: RFCOMM ver 1.11
 rfkill: input handler disabled
 [UFW BLOCK] IN=enp3s0f1 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2
 [UFW BLOCK] IN=wlp2s0 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2
 [UFW BLOCK] IN=enp3s0f1 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2
 [UFW BLOCK] IN=wlp2s0 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2
 [UFW BLOCK] IN=enp3s0f1 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2
 [UFW BLOCK] IN=wlp2s0 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2

Package: linux-image-4.15.0-151-generic 4.15.0-151.157
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 4.15.0-151-generic x86_64
---------------------------------------------------------------------------------------
The system is a laptop from Entroware based on Clevo and has 8 logical CPUs:-
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
Stepping: 10
CPU MHz: 2000.295
CPU max MHz: 4000.0000
CPU min MHz: 800.0000
BogoMIPS: 4599.93
Virtualisation: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

USB Config:-
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 5986:2110 Acer, Inc
Bus 001 Device 003: ID 04f9:02ec Brother Industries, Ltd MFC-J870DW
Bus 001 Device 005: ID 8087:07dc Intel Corp. Bluetooth wireless interface
Bus 001 Device 002: ID 0d8c:0104 C-Media Electronics, Inc. CM103+ Audio Controller
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

PCI Config:-
00:00.0 Host bridge: Intel Corporation Device 3e10 (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Device 3e9b
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Device a353 (rev 10)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port 9 (rev f0)
00:1d.5 PCI bridge: Intel Corporation Device a335 (rev f0)
00:1d.6 PCI bridge: Intel Corporation Device a336 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a30d (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
02:00.0 Network controller: Intel Corporation Wireless 3160 (rev 93)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader (rev 01)
03:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

This has only started happening since using 4.15.0-151. Reverting to 4.15.0-147 makes the system stable.

Revision history for this message
Andreas Dworsky (hcldrd) wrote :

I am also having issues since last week when kernel 4.15.0-151 got installed. I am now using 4.15-0-147 but not long enough to tell if this changes the picture.

What it does help for is that switching from the GUI to a virtual console back and forth is now working - again, with 151 it triggered a freeze or crash already.

My system is a Lenovo Thinkpad P50 with integrated graphics. I tried to disable HT, switched from nvidia to nouveau graphics driver and then to the intel one (to get the graphic adapter out of the picture). Even when booting into run level 3 I faced issues when switching from one virtual console to the other.

Please let me know if you need more information.

I understand that with -147 I am missing important security fixed but I cannot afford an unstable system either as this is used for work.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Daniel Ruf (druf) wrote (last edit ):

Probably same here.

Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
Ubuntu 18 on T540p
FDE enabled
previous kernel was -147, issues starting happening with -151

cat /var/log/apt/history.log | grep -B 3 -A 3 "linux-image-4.15.0-151"

Start-Date: 2021-07-22 08:30:19
Commandline: /usr/bin/unattended-upgrade
Install: linux-headers-4.15.0-151-generic:amd64 (4.15.0-151.157, automatic), linux-modules-4.15.0-151-generic:amd64 (4.15.0-151.157, automatic), linux-image-4.15.0-151-generic:amd64 (4.15.0-151.157, automatic), linux-modules-extra-4.15.0-151-generic:amd64 (4.15.0-151.157, automatic), linux-headers-4.15.0-151:amd64 (4.15.0-151.157, automatic)
Upgrade: linux-headers-generic:amd64 (4.15.0.147.134, 4.15.0.151.139), linux-image-generic:amd64 (4.15.0.147.134, 4.15.0.151.139), linux-generic:amd64 (4.15.0.147.134, 4.15.0.151.139)
End-Date: 2021-07-22 08:31:39

Since then wifi (ipv6) also produced freezes. I had to disable the ipv6 module in grub and then I had no more freezes when the wifi reconnected or if I manually connected to it. But still there were freezes / crashes and in syslog there were only a few of these rss error entries.

/var/crash contains dumps from the timestamps, when the latest freezes happened today so it seems to be definitely caused by the kernel

the freezes and crashes can also lead to other symptoms like a broken filesystem in some cases

ps: adding special characters in the comment textarea causes a server error here (UnicodeEncodeError) for example the symbol from the fish shell

Revision history for this message
Jan (mail-ubuntu-x) wrote (last edit ):

I've also had issues with this kernel on two machines. On my Thinkpad T460p (Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz), I had two cases where the ext4 driver suddenly detected an error in the journal and aborted it. Unfortunately, the system mounted the whole file system read only, so it couldn't save a crash report.
I was able to take a picture of some logs, here is my transcription:
EXT4-fs error (device dm-1): ext4_lookup:1590 inode #14680166: comm Okular::PixmapG: iget: checksum invalid
Aborting journal on device dm-1-8
EXT4-fs (dm-1): Remounting filesystem read-only

After that, lots of "checksum invalid" and "Read-only file system" messages. Rebooting, fsck had so many issues that automatic recovery didn't work and I was dropped to a terminal and had to manually confirm the changes.

The system has full disc encryption, so dm-1 is is the encrypted root volume.

Reverting back to 4.15.0-147-generic has produced no errors since Friday.

Revision history for this message
ScaryTom (t-denley) wrote :

Though it doesn't sound exactly like the issue that @Juerg Haefliger originally describes, I've had the same issue as @Jan (above) on 3 separate laptops (all Lenovo T480s machines). In each case, the upgrade to 4.15.0-151 caused file system corruption that could only be corrected by a manual fsck. Following the fix, these laptops appear to be running fine now, still on 4.15.0-151.

Revision history for this message
Steven Maude (stevenmaude) wrote :

There are more reports of issues with this kernel in this Linux Mint thread:

https://forums.linuxmint.com/viewtopic.php?t=353553&p=2044822

and other reports added to the Ask Ubuntu question linked in the first post here.

Revision history for this message
Juerg Haefliger (juergh) wrote :
Download full text (3.5 KiB)

From askubuntu.com:

Sample from linux-image-4.15.0-151-generic.253271.crash:

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Sun Jul 25 11:32:27 2021
Failure: oops
OopsText:
 general protection fault: 0000 [#1] SMP PTI
 Modules linked in: xfs libcrc32c uas usb_storage rfcomm ccm ip6table_filter ip6_tables iptable_filter v4l2loopback(OE) snd_hrtimer cmac bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi nvidia_drm(POE) intel_rapl x86_pkg_temp_thermal nvidia_modeset(POE) intel_powerclamp coretemp arc4 kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel snd_hda_codec_generic nvidia(POE) pcbc iwlmvm mac80211 snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper asus_nb_wmi cryptd asus_wmi snd_hda_core intel_cstate snd_hwdep intel_rapl_perf serio_raw sparse_keymap intel_wmi_thunderbolt iwlwifi snd_pcm snd_seq_midi snd_seq_midi_event cfg80211 uvcvideo btusb btrtl videobuf2_vmalloc btbcm snd_rawmidi videobuf2_memops btintel videobuf2_v4l2 drm_kms_helper
  bluetooth snd_seq xpad videobuf2_core ff_memless ecdh_generic drm videodev snd_seq_device snd_timer media fb_sys_fops snd syscopyarea sysfillrect sysimgblt mei_me idma64 soundcore virt_dma input_leds joydev mei processor_thermal_device intel_lpss_pci int340x_thermal_zone shpchp intel_pch_thermal intel_lpss intel_soc_dts_iosf elan_i2c mac_hid asus_wireless int3400_thermal acpi_pad acpi_thermal_rel sch_fq_codel ppa parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_asus hid_generic usbhid nvme r8169 ahci nvme_core mii libahci wmi i2c_hid hid video pinctrl_sunrisepoint
 CPU: 4 PID: 81 Comm: kswapd0 Tainted: P OE 4.15.0-151-generic #157-Ubuntu
 Hardware name: ASUSTeK COMPUTER INC. G752VT/G752VT, BIOS G752VT.213 01/06/2016
 RIP: 0010:find_get_entries+0x68/0x200
 RSP: 0018:ffffb54cc384f9d0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000006
 RDX: 1800000000000000 RSI: 0000000000001000 RDI: ffff9730446816d0
 RBP: ffffb54cc384fa30 R08: 0000000000000800 R09: 0000000000000006
 R10: ffff9730446817f8 R11: 0000000000000000 R12: ffffb54cc384faf8
 R13: ffffb54cc384fa78 R14: 000000000000000c R15: ffff9730446817f8
 FS: 0000000000000000(0000) GS:ffff973606500000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000a520680c000 CR3: 00000005c260a005 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  pagevec_lookup_entries+0x1e/0x30
  truncate_inode_pages_range+0x127/0x960
  ? xfs_mount_validate_sb+0x440/0x500 [xfs]
  ? __inode_wait_for_writeback+0x7e/0xf0
  ? bit_waitqueue+0x40/0x40
  truncate_inode_pages_final+0x4c/0x60
  evict+0x188/0x1a0
  dispose_list+0x39/0x50
  prune_icache_sb+0x5a/0x80
  super_cache_scan+0x137/0x1b0
  shrink_slab.part.49+0x1e7/0x440
  shrink_node+0x2e1/0x2f0
  kswapd+0x2b1/0x710
  kthread+0x121/0x140
  ? mem_cgroup_shrink_node+0x190/0x190
  ? kthread_create_worker_on_cpu+0x70/0x70
  ret_from_fork+0x35/0x40
 Code: c7 45 a8 00 00 00 00 48 89 75 b0 45 31 ff 4d ...

Read more...

Revision history for this message
Andreas Dworsky (hcldrd) wrote :
Download full text (7.2 KiB)

My stacks are inconclusive I would think. Sometimes there was a hang without a stack. Some examples asre

Jul 22 10:38:37 everholt kernel: [ 2317.070485] Hardware name: LENOVO 20EQS3B30L/20EQS3B30L, BIOS N1EET62W (1.35 ) 11/10/2016
Jul 22 10:38:37 everholt kernel: [ 2317.070486] Call Trace:
Jul 22 10:38:37 everholt kernel: [ 2317.070492] dump_stack+0x6d/0x8b
Jul 22 10:38:37 everholt kernel: [ 2317.070495] print_bad_pte+0x222/0x2e0
Jul 22 10:38:37 everholt kernel: [ 2317.070498] ? alloc_pages_current+0x6a/0xe0
Jul 22 10:38:37 everholt kernel: [ 2317.070501] _vm_normal_page+0x9c/0x100
Jul 22 10:38:37 everholt kernel: [ 2317.070503] unmap_page_range+0x53f/0xd00
Jul 22 10:38:37 everholt kernel: [ 2317.070506] unmap_single_vma+0x7d/0xf0
Jul 22 10:38:37 everholt kernel: [ 2317.070508] unmap_vmas+0x51/0xb0
Jul 22 10:38:37 everholt kernel: [ 2317.070511] exit_mmap+0xb5/0x1d0
Jul 22 10:38:37 everholt kernel: [ 2317.070514] mmput+0x57/0x140
Jul 22 10:38:37 everholt kernel: [ 2317.070516] do_exit+0x352/0xb90
Jul 22 10:38:37 everholt kernel: [ 2317.070518] do_group_exit+0x43/0xb0
Jul 22 10:38:37 everholt kernel: [ 2317.070520] get_signal+0x142/0x7a0
Jul 22 10:38:37 everholt kernel: [ 2317.070523] do_signal+0x37/0x720
Jul 22 10:38:37 everholt kernel: [ 2317.070527] ? do_futex+0x370/0x4e0
Jul 22 10:38:37 everholt kernel: [ 2317.070529] ? __switch_to+0x123/0x4e0
Jul 22 10:38:37 everholt kernel: [ 2317.070531] ? __switch_to_asm+0x35/0x70
Jul 22 10:38:37 everholt kernel: [ 2317.070533] ? __switch_to_asm+0x35/0x70
Jul 22 10:38:37 everholt kernel: [ 2317.070535] ? SyS_futex+0x13b/0x180
Jul 22 10:38:37 everholt kernel: [ 2317.070538] exit_to_usermode_loop+0x73/0xd0
Jul 22 10:38:37 everholt kernel: [ 2317.070540] do_syscall_64+0x121/0x130
Jul 22 10:38:37 everholt kernel: [ 2317.070541] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Jul 22 10:38:37 everholt kernel: [ 2317.070544] RIP: 0033:0x7efc92868ad3
Jul 22 10:38:37 everholt kernel: [ 2317.070545] RSP: 002b:00007efc5effc9a0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
Jul 22 10:38:37 everholt kernel: [ 2317.070547] RAX: fffffffffffffe00 RBX: 000056451b009460 RCX: 00007efc92868ad3
Jul 22 10:38:37 everholt kernel: [ 2317.070548] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000056451b00948c
Jul 22 10:38:37 everholt kernel: [ 2317.070549] RBP: 000056451b009484 R08: 0000000000000001 R09: 0000000000000000
Jul 22 10:38:37 everholt kernel: [ 2317.070550] R10: 0000000000000000 R11: 0000000000000246 R12: 000056451b00948c
Jul 22 10:38:37 everholt kernel: [ 2317.070552] R13: 0000000000000000 R14: 000056451b016470 R15: 00000000000017b9

Jul 23 09:33:14 everholt kernel: [ 1770.530621] ? _nv029462rm+0x2c1/0x520 [nvidia]
Jul 23 09:33:14 everholt kernel: [ 1770.530835] ? _nv029436rm+0x69/0x140 [nvidia]
Jul 23 09:33:14 everholt kernel: [ 1770.531035] ? _nv002278rm+0x9/0x20 [nvidia]
Jul 23 09:33:14 everholt kernel: [ 1770.531234] ? _nv003733rm+0x1b/0x80 [nvidia]
Jul 23 09:33:14 everholt kernel: [ 1770.531435] ? _nv014655rm+0x706/0x770 [nvidia]
Jul 23 09:33:14 everholt kernel: [ 1770.531636] ? _nv037695rm+0xb3/0x150 [nvidia]
Jul 23 09:33:14 everholt kernel: [ 1770.531836] ? _nv037694rm+0x388/0x4e0 [nvidia...

Read more...

Revision history for this message
Stefan Bader (smb) wrote :

I think all data is a little inconclusive. There are things across many places which seem to go wrong in seemingly random patterns. Which makes me a little suspicious about locking (alternatively maybe memory management). Looking over the changes there was mutex change which seems subtly different than the change made to upstream kernels. Not obviously different enough to be a clear suspect but maybe worth trying. For this I would need someone volunteering to try the kernel I prepared at https://launchpad.net/~smb/+archive/ubuntu/bionic (still needs to finish building some arches). This is not a officially signed kernel, so it only should be tried if secure-boot is not used or can be disabled.

The PPA can be added with "sudo apt-add-repository ppa:smb/bionic" (same command can also be used with --remove to get rid of it later).

Revision history for this message
Juerg Haefliger (juergh) wrote :
Download full text (4.1 KiB)

[ 2196.470134] do_trap: 20 callbacks suppressed
[ 2196.470137] traps: gdbus[1711] trap invalid opcode ip:7f4dcea47cbd sp:7f4dcbca4c20 error:0 in libc-2.27.so[7f4dce933000+1e7000]
[ 2197.267081] general protection fault: 0000 [#1] SMP PTI
[ 2197.267389] Modules linked in: rfcomm ccm cmac bnep nls_iso8859_1 snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_hda_codec_hdmi snd_hda_codec_realtek snd_soc_acpi snd_hda_codec_generic snd_soc_core arc4 snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_hda_codec intel_rapl iwlmvm x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp mac80211 snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event kvm_intel snd_rawmidi snd_seq iwlwifi kvm snd_seq_device snd_timer irqbypass intel_cstate intel_rapl_perf hp_wmi snd joydev sparse_keymap input_leds wmi_bmof serio_raw intel_wmi_thunderbolt uvcvideo btusb videobuf2_vmalloc btrtl cfg80211 btbcm videobuf2_memops btintel soundcore bluetooth videobuf2_v4l2 videobuf2_core mei_me videodev shpchp media ecdh_generic processor_thermal_device
[ 2197.271405] mei int340x_thermal_zone intel_pch_thermal intel_soc_dts_iosf int3400_thermal mac_hid hp_wireless acpi_thermal_rel acpi_pad sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 pcbc i2c_algo_bit aesni_intel drm_kms_helper syscopyarea sysfillrect aes_x86_64 crypto_simd glue_helper cryptd sysimgblt fb_sys_fops psmouse r8169 ahci drm mii libahci wmi video
[ 2197.274805] CPU: 0 PID: 1710 Comm: gmain Not tainted 4.15.0-151-generic #157-Ubuntu
[ 2197.275250] Hardware name: HP HP Notebook/81EB, BIOS F.45 10/16/2018
[ 2197.275627] RIP: 0010:perf_event_release_kernel+0x2f/0x2f0
[ 2197.275947] RSP: 0018:ffff9c440265bc68 EFLAGS: 00010246
[ 2197.276250] RAX: 0000000000000000 RBX: 6800000000000000 RCX: 0000000000000000
[ 2197.276665] RDX: 0000000000000040 RSI: ffff9c440265bc78 RDI: 6800000000000000
[ 2197.277082] RBP: ffff9c440265bcb8 R08: ffff8ab257656a88 R09: 0000000000000000
[ 2197.277493] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ab257652f00
[ 2197.277906] R13: ffff8ab257654218 R14: ffff8ab257652f00 R15: ffff8ab2598debb0
[ 2197.278323] FS: 00007f4dcc4a6700(0000) GS:ffff8ab26ec00000(0000) knlGS:0000000000000000
[ 2197.278796] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2197.279133] CR2: 00007f420c008000 CR3: 0000000249a0a005 CR4: 00000000003606f0
[ 2197.279546] Call Trace:
[ 2197.279701] unregister_hw_breakpoint+0x13/0x20
[ 2197.279973] flush_ptrace_hw_breakpoint+0x2b/0x60
[ 2197.280251] do_exit+0x3d1/0xb90
[ 2197.280450] do_group_exit+0x43/0xb0
[ 2197.280670] get_signal+0x142/0x7a0
[ 2197.280883] ? futex_wake+0x8f/0x180
[ 2197.281100] do_signal+0x37/0x720
[ 2197.281303] ? recalc_sigpending+0x1b/0x50
[ 2197.281551] ? _copy_from_user+0x3e/0x60
[ 2197.281790] exit_to_usermode_loop+0x73/0xd0
[ 2197.282047] do_syscall_64+0x121/0x130
[ 2197.282276...

Read more...

Revision history for this message
Nick White (nickwh) wrote :

For Stefan Bader(smb)

>For this I would need someone volunteering to try the kernel I prepared at >https://launchpad.net/~smb/+archive/ubuntu/bionic (still needs to finish building some arches). >This is not a officially signed kernel, so it only should be tried if secure-boot is not used or >can be disabled.

Tried this kernel on my Entroware system as per the initial reporter in this thread(Nick White). Bad news the system hung with just the mouse pointer showing prior to displaying the login screen. kern.log has lots of "^@" after dealing with the usblp. On the 147 kernel, the next thing after usblp is "usbcore: registered new interface driver usblp" and then Bluetooth. Appreciate that start up sequences may vary due to systemd. Nothing new in /var/crash.

Revision history for this message
Stefan Bader (smb) wrote :

For Nick White: would you be able to supply the output you saw when your system hung (if this is already past mounting the real rootfs there could be old entries. journalctl --list-boots to see which and journalctl -b<number> to show logs from a previous boot).
Unfortunately we cannot see any hangs with VMs and reproduction of the various symptoms seems only be possible with luck and on laptop hardware. This could be something related to USB... Will try to follow some of the bread crumbs we got.

Revision history for this message
Martin Malec (martin-malec) wrote :

I have a similar behaviour (hangs) recently, once with a corrupted filesystem, had to force shutdown laptop several times a day. Today when I was in console and not GUI (because the hang happened already during logging in to Xfce and I was lucky to be able to switch via Ctrl-Alt-F1 to console, I finally saw the Kernel Oops.

Using Mint 19.3 on HP ProBook 6470b/179B BIOS 68ICF Ver F.73 08/07/2018

CPU Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz

Message on the console when system crashed:

BUG: unable to handle kernel paging request at 000000d4b2d9323e
IP: __handle_mm_fault+0x5b/0xff0
PGD 0 P4D 0
Oops: 0000 [#2] SMP PTI

... entire trace I have in a photo of the screen if necessary.

Laptop was rock stable until recently and it appears the previous 4.15.0-147-generic #151-Ubuntu SMP kernel is OK, currently running at this kernel and decided to remove for now the probably problematic kernel and block auto-updates to it.

Revision history for this message
Stéphane Lesimple (speed47) wrote :

Same thing since I upgraded to 4.15.0-151-generic on my laptop. Previously had 150+ days of uptime, now it crashes every few days, and up to 3 times a day.

Very similar behaviour than what Martin reported just above: when logging in to XFCE, it can crash easily, or when simply using the Desktop. When it happens, this is a hard freeze, the watchdog doesn't trigger (I waited for 1 hour), and even the sysrq keys no longer work (yes, these are enabled).

Hardware: DELL Latitude 5300.

To add some datapoints, here are my 2 last crash reports:

==> /var/crash/linux-image-4.15.0-151-generic.31533.crash <==
ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Fri Jul 23 15:41:43 2021
Failure: oops
OopsText:
 BUG: Bad page map in process Renderer pte:8000009fb753c258 pmd:3993e9067
 addr:00000000b8067c8b vm_flags:080000f9 anon_vma: (null) mapping:0000000073ae2b58 index:648
 file:memfd:mozilla-ipc fault:shmem_fault mmap:shmem_mmap readpage: (null)
 CPU: 0 PID: 5797 Comm: Renderer Not tainted 4.15.0-151-generic #157-Ubuntu
 Hardware name: Dell Inc. Latitude 5300/0932VT, BIOS 1.8.1 12/16/2019
 Call Trace:

Package: linux-image-4.15.0-151-generic 4.15.0-151.157
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 4.15.0-151-generic x86_64

==> /var/crash/linux-image-4.15.0-151-generic.45463.crash <==
ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Fri Jul 23 15:41:42 2021
Failure: oops
OopsText:
 BUG: Bad page cache in process Renderer pfn:2d5fe5
 page:ffffed164b57f940 count:3 mapcount:1 mapping:ffff8fdc5041b488 index:0x648
 flags: 0x17ffffc004002d(locked|referenced|uptodate|lru|swapbacked)
 raw: 0017ffffc004002d ffff8fdc5041b488 0000000000000648 0000000300000000
 raw: ffffed164d96dbe0 ffffed1650092820 0000000000000000 ffff8fdda9018000
 page dumped because: still mapped when deleted
 page->mem_cgroup:ffff8fdda9018000
 CPU: 0 PID: 5797 Comm: Renderer Tainted: G B 4.15.0-151-generic #157-Ubuntu
 Hardware name: Dell Inc. Latitude 5300/0932VT, BIOS 1.8.1 12/16/2019
 Call Trace:

Package: linux-image-4.15.0-151-generic 4.15.0-151.157
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 4.15.0-151-generic x86_64

Revision history for this message
Martin Malec (martin-malec) wrote :
summary: - 4.15.0-151 is freezing intel 5th gen ThinkPad (T450)
+ 4.15.0-151 is freezing various Intel machines (i5-3gen, i5-5gen
+ reported)
summary: - 4.15.0-151 is freezing various Intel machines (i5-3gen, i5-5gen
- reported)
+ 4.15.0-151 is freezing various CPUs
Revision history for this message
Martin Malec (martin-malec) wrote :

Based on report from https://askubuntu.com/questions/1353859/ubuntu-18-04-05-lts-desktop-hangs-with-since-kernel-4-15-0-151-and-systemd-237-3/1354324#1354324 the issue happens also with AMD processor AMD Ryzen 7 1700 stepping 1 microcode : 0x8001138

therefore it does not seem to be related either to just i5-5gen, nor to Intel CPUs such as my i5-3gen, but also to AMDs. For this reason I changed the summary.

Revision history for this message
Callum Fare (cfare) wrote : Re: 4.15.0-151 is freezing various Intel machines (i5-3gen, i5-5gen reported)

I'm seeing similar behavior after updating to 4.15.0-151, with multiple crashes and hangs. My CPU is a i7-6600U.

The message in the dmesg log from the last crash is 'BUG: unable to handle kernel paging request at 0000006a18e6bfe5'

I've attached the full trace in case that helps. I also seemingly have some older (but after updating to 4.15.0-151) crash logs in /var/crash/ that I can also share if these are helpful.

Revision history for this message
Juerg Haefliger (juergh) wrote :

We believe it's a memory corruption in the wifi stack (mac80211 module). Expect a test kernel soon.

Revision history for this message
Andreas Dworsky (hcldrd) wrote :

I did encounter the issue on a Lenovo Thinkpad P50 with an Intel wifi adapter. At the time of the freezes and crashes the ethernet cable connection was used - wifi on and configured, though.

I leave it to your good judgement if the sketched scenario will also be covered by the fix. Thanks.

Revision history for this message
Daniel Ruf (druf) wrote :

In my case it froze everytime "wlp4s0: WPA: Group rekeying completed with ..." appeared in the logs or if I manually connected to the wifi.

Disabled ipv6, didn't crash so often anymore and not on each manual connect anymore.

But still had some freezes (especially if the CPU load changed abruptly), until I switched to the 147 kernel.

In logs I saw the kernel page error and also that the CPU throttled shortly before the crashes. Not sure if this helps to pinpoint the exact cause.

Revision history for this message
Juerg Haefliger (juergh) wrote :

A test kernel is available here: https://kernel.ubuntu.com/~juergh/lp1938013/
We'd very much appreciate it if people could do some testing. Download the debs and install them with 'dpkg -i *.deb'. The kernel is unsigned so secure boot needs to be disabled.

Revision history for this message
Nick White (nickwh) wrote :

For Stefan Bader - Upload of journalctl -b<number> for your test kernel in comment 12, which hung

Revision history for this message
Nick White (nickwh) wrote :

For Stefan Bader - Upload of journal -b<number> for the first hang with the Ubuntu 151 kernel. The last thing in the log is notification that Power Key pressed

Revision history for this message
Nick White (nickwh) wrote :

For Juerg Haefliger:

>A test kernel is available here: https://kernel.ubuntu.com/~juergh/lp1938013/
>We'd very much appreciate it if people could do some testing. Download the debs and install them >with 'dpkg -i *.deb'. The kernel is unsigned so secure boot needs to be disabled.

Kernel installed and been running for over 30 minutes without any issues. Will update after working it a bit harder.

Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

> A test kernel is available here: https://kernel.ubuntu.com/~juergh/lp1938013/

I found a laptop running bionic. Updated to the newest archive kernel and rebooted. Hit the regression immediately, dmesg would show kernel errors right after the wlan0 lines.

With the test kernel, I've been running for 20 minutes and haven't seen the issue so far, so I think the issue has been identified.

Revision history for this message
olin00 (olin00) wrote :

I had the same freezing issue on my i5-8250U laptop since I've upgraded the kernel to 4.15.0-151 via apt upgrade. The freeze happened 3x in the morning, once while working in desktop, then twice on the lightdm logging screen. The mouse and keyboard completely froze, all I could do was to turn the laptop off via power button. The temporary fix was to power-cycle and use kernel 4.15.0-147 via grub menu.

I installed the test kernel 4.15.0-153 from Juerg, and so far it seems to fix the issue for me. I did 10 restarts, logging to desktop and starting firefox on this web page. I also did 2 complete power-offs and some compilation as well in between. No issue so far, looks promising.
Thanks for checking the issue.

Linux pc 4.15.0-153-generic #160~20210728+gitfbbaa7f1 SMP Wed Jul 28 14:25:41 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

Revision history for this message
Nick White (nickwh) wrote :

Been running on Juerg's 4.15.0-153 kernel for 5 hours without any known issues. /var/crash has no new files in it. Have printed a very small number of pages over wifi. Majority of use with Firefox and Brave with no issues. Getting much happier with this new kernel. THANKS!!

Revision history for this message
Richard Gray (grayro) wrote :

I want to add that I too began having problems on my Lenovo T480 after I installed the 4.15.0-151 kernel. I have gone back to kernel 147 and that solved all the problems. My problems were similar to those reported above: the computer sometimes crashed on boot. When it did manage to boot the mouse was often frozen and other things as well did not function, such as volume control from the keyboard. Programs either did not run or malfunctioned when run. When I tried to shutdown, it would often just hang for an extended period of time and it was necessary to use the power button.

Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: Confirmed → Fix Committed
assignee: nobody → Stefan Bader (smb)
Revision history for this message
Martin Malec (martin-malec) wrote :

Currently testing the Juerg's test kernel and so far no issues. In /var/log/kern.log after wlo1 authenticated and associated at cca 22s after boot no freezes and oopses yet - compared to the 151 kernel which often failed right after wlan tried to become associated.

Revision history for this message
Martin Malec (martin-malec) wrote :
Revision history for this message
Martin Malec (martin-malec) wrote :
Revision history for this message
Martin Malec (martin-malec) wrote :
Revision history for this message
Wojtek Kazimierczak (w-kazimierczak) wrote :

Tested on Lenovo ThinkPad 13, i7-7500U, Intel Wireless 8265 / 8275. With 4.15.0-151 there was a freeze after 1-15 min. After 6 hours no issues with 4.15.0-153 kernel. Thank you Juerg.

Revision history for this message
Maikel (maikel12) wrote :

Any ETA on kernel release?

Revision history for this message
Juerg Haefliger (juergh) wrote :

Any day now. The kernel is in proposed and under testing: 4.15.0-153.160.
You can enable proposed if you want it early.

Revision history for this message
Peter Maffter (pmaff) wrote :

Can confirm this bug on a
Micro-Star International Co., Ltd. GE63 7RD
with 4.15.0-151.

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Mon Aug 2 20:22:08 2021
Failure: oops
OopsText:
 BUG: Bad rss-counter state mm:00000000e1ad1a66 idx:3 val:1
 BUG: Bad page cache in process TaskCon~read #2 pfn:1f8793
 page:ffffdb5587e1e4c0 count:3 mapcount:1 mapping:ffff94a44c32bcf8 index:0xf3
 flags: 0x17ffffc004002d(locked|referenced|uptodate|lru|swapbacked)
 raw: 0017ffffc004002d ffff94a44c32bcf8 00000000000000f3 0000000300000000
 raw: ffffdb5587236fe0 ffffdb5585070820 0000000000000000 ffff94a6ae146000
 page dumped because: still mapped when deleted
 page->mem_cgroup:ffff94a6ae146000
 CPU: 4 PID: 14576 Comm: TaskCon~read #2 Tainted: G B OE 4.15.0-151-generic #157-Ubuntu
 Hardware name: Micro-Star International Co., Ltd. GE63 7RD/MS-16P3, BIOS E16P3IMS.10A 09/05/2018
 Call Trace:

Package: linux-image-4.15.0-151-generic 4.15.0-151.157
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 4.15.0-151-generic x86_64

Revision history for this message
Dominique Pellé (dominique-pelle) wrote :
Download full text (21.2 KiB)

I'm also affected by this bug:
- 4.15.0-151-generic freezes
- 4.15.0-147-generic does not freeze (previous available kernel in grub)

My CPU: Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz

The freeze happens generally at the login screenshot (sometimes before, sometimes after, but shortly after booting in any cases).

I am forced to reboot my computer and select the older kernel in grub to properly boot.
In /var/log/kern.log I see Oops:
```
Jul 31 08:02:02 pel-cirrus7 kernel: [ 5.760459] usbcore: registered new interface driver snd-usb-audio
Jul 31 08:02:02 pel-cirrus7 kernel: [ 5.782978] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
Jul 31 08:02:02 pel-cirrus7 kernel: [ 5.782979] Bluetooth: BNEP filters: protocol multicast
Jul 31 08:02:02 pel-cirrus7 kernel: [ 5.782981] Bluetooth: BNEP socket layer initialized
Jul 31 08:02:02 pel-cirrus7 kernel: [ 5.994114] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.185403] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.190802] IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is not ready
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.239802] r8169 0000:04:00.0 enp4s0: link down
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.239893] IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is not ready
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.242976] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.248737] ip6_tables: (C) 2000-2006 Netfilter Core Team
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.270569] Ebtables v2.0 registered
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.337871] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
Jul 31 08:02:02 pel-cirrus7 kernel: [ 6.405469] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
Jul 31 08:02:03 pel-cirrus7 kernel: [ 6.774881] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
Jul 31 08:02:03 pel-cirrus7 kernel: [ 6.775624] virbr0: port 1(virbr0-nic) entered blocking state
Jul 31 08:02:03 pel-cirrus7 kernel: [ 6.775625] virbr0: port 1(virbr0-nic) entered disabled state
Jul 31 08:02:03 pel-cirrus7 kernel: [ 6.775673] device virbr0-nic entered promiscuous mode
Jul 31 08:02:03 pel-cirrus7 kernel: [ 6.852658] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
Jul 31 08:02:03 pel-cirrus7 kernel: [ 7.331935] virbr0: port 1(virbr0-nic) entered blocking state
Jul 31 08:02:03 pel-cirrus7 kernel: [ 7.331937] virbr0: port 1(virbr0-nic) entered listening state
Jul 31 08:02:03 pel-cirrus7 kernel: [ 7.372079] virbr0: port 1(virbr0-nic) entered disabled state
Jul 31 08:02:06 pel-cirrus7 kernel: [ 9.928363] wlp2s0: authenticate with ec:8a:4c:94:01:aa
Jul 31 08:02:06 pel-cirrus7 kernel: [ 9.932925] wlp2s0: send auth to ec:8a:4c:94:01:aa (try 1/3)
Jul 31 08:02:06 pel-cirrus7 kernel: [ 9.935089] wlp2s0: authenticated
Jul 31 08:02:06 pel-cirrus7 kernel: [ 9.937297] wlp2s0: associate with ec:8a:4c:94:01:aa (try 1/3)
Jul 31 08:02:06 pel-cirrus7 kernel: [ 9.940968] wlp2s0: RX AssocResp from ec:8a:4c:94:01:aa (capab=0x411 status=...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.15.0-153.160

---------------
linux (4.15.0-153.160) bionic; urgency=medium

  * bionic/linux: 4.15.0-153.160 -proposed tracker (LP: #1938319)

  * 4.15.0-151 is freezing various CPUs (LP: #1938013)
    - mac80211: fix memory corruption in EAPOL handling

 -- Stefan Bader <email address hidden> Thu, 29 Jul 2021 08:26:59 +0200

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Stefan Bader (smb) wrote :

It looks like there was an issue with the bots. This fix is released and no longer requires verification.

tags: removed: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Andreas Dworsky (hcldrd) wrote :

My 18.04 is running with the new official kernel 4.15.0-153-generic since it got updated via the system update. No hangs or crashes so far.

Thanks a lot for the quick fix!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers