System hard locks up intermittently

Bug #1871687 reported by Alan Pope 🍺🐧🐱 πŸ¦„
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

My T450 main laptop is generally left on all day and night. I sometimes suspend it when moving around the house, but it's often left on all the time. Sometimes when I go to start work in the morning there's some corruption on screen and the laptop won't respond. I can't REISUB or ssh in.

This has happened a few times since clean installing 20.04. Previously I was on an upgraded-to-20.04 install and don't recall it happening there. The other obvious difference is I'm now running ZFS which I wasn't before. :S

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-21-generic 5.4.0-21.25
ProcVersionSignature: Ubuntu 5.4.0-21.25-generic 5.4.27
Uname: Linux 5.4.0-21-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu24
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC3: alan 8365 F.... pulseaudio
 /dev/snd/controlC2: alan 8365 F.... pulseaudio
 /dev/snd/controlC1: alan 8365 F.... pulseaudio
 /dev/snd/controlC0: alan 8365 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Wed Apr 8 19:30:46 2020
InstallationDate: Installed on 2020-03-01 (38 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Alpha amd64 (20200301)
MachineType: LENOVO 20BV001BUK
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_pyyy5h@/vmlinuz-5.4.0-21-generic root=ZFS=rpool/ROOT/ubuntu_pyyy5h ro quiet splash
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-21-generic N/A
 linux-backports-modules-5.4.0-21-generic N/A
 linux-firmware 1.187
SourcePackage: linux
StagingDrivers: exfat
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/14/2019
dmi.bios.vendor: LENOVO
dmi.bios.version: JBET73WW (1.37 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20BV001BUK
dmi.board.vendor: LENOVO
dmi.board.version: 0B98417 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrJBET73WW(1.37):bd08/14/2019:svnLENOVO:pn20BV001BUK:pvrThinkPadT450:rvnLENOVO:rn20BV001BUK:rvr0B98417WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad T450
dmi.product.name: 20BV001BUK
dmi.product.sku: LENOVO_MT_20BV_BU_Think_FM_ThinkPad T450
dmi.product.version: ThinkPad T450
dmi.sys.vendor: LENOVO

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Revision history for this message
Seth Forshee (sforshee) wrote :

I'm not finding much in the logs. There's some i915 splat which could possibly be related to the corruption, but nothing to explain the hard lockup.

The current focal-proposed kernel (5.4.0-24) has some fixes for races/deadlocks in i915, so it's worth trying that out to see if it helps. You can use ppa:canonical-kernel-team/proposed to get that kernel.

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Download full text (4.8 KiB)

Had this a few more times. Took some time to rummage in the journal and found this just before it happened.

May 01 10:00:54 mcp kernel: ------------[ cut here ]------------
May 01 10:00:54 mcp kernel: pipe state doesn't match!
May 01 10:00:54 mcp kernel: WARNING: CPU: 1 PID: 13809 at drivers/gpu/drm/i915/display/intel_display.c:13148 verify_crtc_state+0x2ad/0x300 [i915]
May 01 10:00:54 mcp kernel: Modules linked in: scsi_transport_iscsi sctp hidp nls_iso8859_1 exfat(C) mmc_block nls_utf8 isofs uas usb_storage ebtable_filter ebtables unix_diag btrfs xor zstd_compress raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid rfcomm dummy veth xt_comment xt_CHECKSUM xt_MA>
May 01 10:00:54 mcp kernel: snd_seq_midi input_leds snd_seq_midi_event iwlmvm joydev serio_raw snd_hda_codec_realtek mac80211 snd_hda_codec_generic snd_hda_codec_hdmi libarc4 snd_hda_intel snd_rawmidi wmi_bmof snd_intel_nhlt snd_hda_codec snd_hda_core iwlwifi snd_hwdep snd_pcm snd_seq cfg80211 thinkpad_ac>
May 01 10:00:54 mcp kernel: CPU: 1 PID: 13809 Comm: Xorg Tainted: P C O 5.4.0-21-generic #25-Ubuntu
May 01 10:00:54 mcp kernel: Hardware name: LENOVO 20BV001BUK/20BV001BUK, BIOS JBET73WW (1.37 ) 08/14/2019
May 01 10:00:54 mcp kernel: RIP: 0010:verify_crtc_state+0x2ad/0x300 [i915]
May 01 10:00:54 mcp kernel: Code: b6 45 bf e9 8d fe ff ff 80 3d 15 79 10 00 00 0f b6 f0 48 c7 c7 78 b7 3c c0 75 32 e8 7d 2a e0 ff e9 1e fe ff ff e8 be 8f 7a d0 <0f> 0b e9 1b ff ff ff e8 b2 8f 7a d0 0f 0b e9 85 fe ff ff e8 a6 8f
May 01 10:00:54 mcp kernel: RSP: 0018:ffffa72a8dbdba80 EFLAGS: 00010282
May 01 10:00:54 mcp kernel: RAX: 0000000000000000 RBX: ffff8b4d66ee02b0 RCX: 0000000000000006
May 01 10:00:54 mcp kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b4d9dc578c0
May 01 10:00:54 mcp kernel: RBP: ffffa72a8dbdbac8 R08: 000000000002ce27 R09: 0000000000000004
May 01 10:00:54 mcp kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b4d66d56800
May 01 10:00:54 mcp kernel: R13: ffff8b4d66ee02b8 R14: ffff8b4d66ee0000 R15: ffff8b4927284000
May 01 10:00:54 mcp kernel: FS: 00007f48d9e10a80(0000) GS:ffff8b4d9dc40000(0000) knlGS:0000000000000000
May 01 10:00:54 mcp kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 01 10:00:54 mcp kernel: CR2: 00007f6cbbbc6421 CR3: 00000006c6c06002 CR4: 00000000003606e0
May 01 10:00:54 mcp kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 01 10:00:54 mcp kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 01 10:00:54 mcp kernel: Call Trace:
May 01 10:00:54 mcp kernel: intel_atomic_commit_tail+0xa4b/0x12f0 [i915]
May 01 10:00:54 mcp kernel: ? flush_workqueue_prep_pwqs+0x12e/0x140
May 01 10:00:54 mcp kernel: ? flush_workqueue+0x193/0x420
May 01 10:00:54 mcp kernel: ? intel_atomic_commit_ready+0x4d/0x54 [i915]
May 01 10:00:54 mcp kernel: intel_atomic_commit+0x284/0x2b0 [i915]
May 01 10:00:54 mcp kernel: drm_atomic_commit+0x4a/0x50 [drm]
May 01 10:00:54 mcp kernel: drm_atomic_connector_commit_dpms+0xdf/0x100 [drm]
May 01 10:00:54 mcp kernel: drm_mode_obj_set_property_ioctl+0x156/0x2a0 [drm]
May 01 10:00:54 mcp kernel: ? drm_connector_set_obj_prop+0x90/0x90 [drm]
...

Read more...

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :

This feels very multi-monitor centric.

I have had it numerous times overnight. I have 2 external displays permanently attached, so 3 displays active in total, at most times. This is sometimes triggered overnight, probably when a notification comes in and triggers the displays to wake up.

This morning I noticed one display was off, like it was receiving no signal. But it is there in gnome display settings. So I try and turn it off and on again in there. But that causes the display corruption to appear, and it locked up. I had to hard reboot.

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :

Adding your PPA and getting linux-image-5.4.0-29-generic. Will reboot and try this over the weekend. I never know how to trigger this, so no idea if it actually will or not.

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :

I'm running linux-image-5.4.0-30-generic and it happened again last night.
I come back to the machine and see corruption on both external panels, while internal panel is off. So it does indeed feel like i915 related. Would be happy to try and get more debug info, but it's hard given the machine completely locks up.

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Download full text (6.3 KiB)

I've had this twice today.
Lost data as a result of walking away for a cup of tea. Came back to a locked up desktop

May 12 11:18:04 mcp kernel: [drm:pipe_config_mismatch [i915]] *ERROR* mismatch in pixel_rate (expected 148500, found 296999)
May 12 11:18:04 mcp kernel: [drm:pipe_config_mismatch [i915]] *ERROR* mismatch in shared_dpll (expected 000000003724b6ad, found 000000003f64da57)
May 12 11:18:04 mcp kernel: [drm:pipe_config_mismatch [i915]] *ERROR* mismatch in base.adjusted_mode.crtc_clock (expected 148500, found 296999)
May 12 11:18:04 mcp kernel: [drm:pipe_config_mismatch [i915]] *ERROR* mismatch in port_clock (expected 270000, found 540000)
May 12 11:18:04 mcp kernel: ------------[ cut here ]------------
May 12 11:18:04 mcp kernel: pipe state doesn't match!
May 12 11:18:04 mcp kernel: WARNING: CPU: 0 PID: 8183 at drivers/gpu/drm/i915/display/intel_display.c:13158 verify_crtc_state+0x2ad/0x300 [i915]
May 12 11:18:04 mcp kernel: Modules linked in: hidp rfcomm xt_comment dummy ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr mei_hdcp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel snd_usb_audio snd_usbmidi_lib gspca_vc032x gspca_main uvcvideo videobuf2_vmalloc snd_hda_codec_realtek usblp snd_hda_codec_generic videobuf2_memops videobuf2_v4l2 snd_hda_codec_hdmi videobuf2_common videodev snd_hda_intel snd_intel_dspcfg mc snd_hda_codec btusb snd_hda_core btrtl aesni_intel btbcm snd_hwdep crypto_simd btintel cryptd bluetooth glue_helper intel_cstate thinkpad_acpi intel_rapl_perf nvram ledtrig_audio ecdh_generic ecc snd_pcm snd_seq_midi
May 12 11:18:04 mcp kernel: snd_seq_midi_event iwlmvm snd_rawmidi mac80211 libarc4 iwlwifi snd_seq wmi_bmof input_leds joydev snd_seq_device snd_timer serio_raw cfg80211 intel_pch_thermal rtsx_pci_ms memstick mei_me snd mei soundcore mac_hid sch_fq_codel kvm_intel kvm parport_pc ppdev lp parport ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlua(PO) hid_generic usbhid hid rtsx_pci_sdmmc i915 i2c_algo_bit crc32_pclmul drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse i2c_i801 ahci libahci e1000e drm rtsx_pci lpc_ich wmi video
May 12 11:18:04 mcp kernel: CPU: 0 PID: 8183 Comm: Xorg Tainted: P O 5.4.0-30-generic #34-Ubuntu
May 12 11:18:04 mcp kernel: Hardware name: LENOVO 20BV001BUK/20BV001BUK, BIOS JBET73WW (1.37 ) 08/14/2019
May 12 11:18:04 mcp kernel: RIP: 0010:verify_crtc_state+0x2ad/0x300 [i915]
May 12 11:18:04 mcp kernel: Code: b6 45 bf e9 8d fe ff ff 80 3d 85 76 10 00 00 0f b6 f0 48 c7 c7 40 aa 52 c0 75 32 e8 5d e7 de ff e9 1e fe ff ff e8 ae 9d 24 e5 <0f> 0b e9 1b ff ff ff e8 a2 9d 24 e5 0f 0b e9 85 fe ff ff e8 96 9d
May 12 11:18:04 mcp kernel: RSP: 0018:ffffb1e543f67a80 EFLAGS: 00010282
May 12 11:18:04 mcp kernel: RAX: 0000000000000000 RBX: ffff8db0a7d202b0 R...

Read more...

Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :
Revision history for this message
Alan Pope 🍺🐧🐱 πŸ¦„ (popey) wrote :

Possibly related: https://bugzilla.redhat.com/show_bug.cgi?id=1506339
Also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1727662 which suggests 5.4.0-rc7-drm-tip-git-g3ff71899c56c works for them

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.