[Dell G7 7588] *ERROR* Fault errors on pipe B: 0x00000080

Bug #1815711 reported by Alex Mantaut
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Hi,

Recently, I installed Ubuntu 18.04 on my Dell G7 7588. From day one, the system has random freezes/crashes (usually a few a day)

The way the crashes manifest is that 2 out of my 3 monitors go black and in some cases sound and inputs get stuck. In some rare cases, I can continue to operate on one of the screens.

I checked syslog and dmesg, and usually I don't see any messages at the time of the crash, except this one time that input did not freeze and I could run a dmesg. In that case I got this message:

Feb 13 15:34:51 morningstar kernel: [ 1593.615704] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 0x00000080
Feb 13 15:34:51 morningstar kernel: [ 1593.615999] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun

Additionally, this message appears on dmesg fairly often:
...
Feb 13 15:21:17 morningstar kernel: [ 780.324457] CPU6: Package temperature above threshold, cpu clock throttled (total events = 109)
Feb 13 15:21:17 morningstar kernel: [ 780.324457] CPU0: Package temperature above threshold, cpu clock throttled (total events = 109)
...

I tried to solve this issue on my own for a while now, without success, here are some of the things I tried:

- Updated BIOS to latest version
- Installed nvidia 415 graphic drivers
- Tested using xwayland
- Tried 4.15.0-45-48, 4.19.16 and 4.19.20 kernels (this last one generates problems with the graphic drivers, and does not solve the problem either.
- Tried modifying power management profile on the BIOS.
- Added splash acpi_rev_override=1 to /etc/default/grub as found here: https://www.reddit.com/r/Dell/comments/63cavx/fixed_nvidia_1050_freezing_in_ubuntu_linux/
- Ran dell diagnostic tools (no errors there)

So far none of this things solved the issue. These error messages are the only noticeable things I could find to debug...

Can someone help me find a fix or workaround the issue? I've been spending multiple days trying to solve this and I am not making any real progress

Please let me know if you need me to submit more information to find the cause of the issue

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: xorg 1:7.7+19ubuntu7.1
Uname: Linux 4.19.16-041916-generic x86_64
NonfreeKernelModules: nvidia_drm nvidia_modeset nvidia
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 415.27 Thu Dec 20 17:25:03 CST 2018
 GCC version: gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
Date: Wed Feb 13 15:51:17 2019
DistUpgraded: Fresh install
DistroCodename: bionic
DistroVariant: ubuntu
DkmsStatus:
 nvidia, 415.27, 4.15.0-45-generic, x86_64: installed
 virtualbox, 5.2.18, 4.15.0-44-generic, x86_64: installed
 virtualbox, 5.2.18, 4.15.0-45-generic, x86_64: installed
 virtualbox, 5.2.18, 4.19.16-041916-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GpuHangFrequency: Several times a day
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 Intel Corporation Device [8086:3e9b] (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:0825]
 NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile] [10de:1c20] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: Dell GP106M [GeForce GTX 1060 Mobile] [1028:0825]
InstallationDate: Installed on 2018-11-30 (75 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
MachineType: Dell Inc. G7 7588
ProcEnviron:
 LANGUAGE=en_AU:en
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_AU.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.19.16-041916-generic root=UUID=ed06b2b0-7398-48df-9971-9f5f225c0152 ro quiet splash acpi_rev_override=1 vt.handoff=1
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/21/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.8.0
dmi.board.name: 0FDMYT
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.8.0:bd12/21/2018:svnDellInc.:pnG77588:pvr:rvnDellInc.:rn0FDMYT:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: GSeries
dmi.product.name: G7 7588
dmi.product.sku: 0825
dmi.sys.vendor: Dell Inc.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.95-1~18.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 18.2.2-0ubuntu1~18.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx 18.2.2-0ubuntu1~18.04.1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.19.6-1ubuntu4.2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:18.0.1-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20171229-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.15-2

Revision history for this message
Alex Mantaut (alexmantaut) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Those error messages are from the kernel so reassigning there.

However they may also be unrelated to the main problem you encounter so please also follow these instructions:
https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

affects: xorg (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Download full text (19.1 KiB)

This error can be found in the dmesg:
[ 2.510006] BERT: Error records from previous boot:
[ 2.510007] [Hardware Error]: event severity: fatal
[ 2.510008] [Hardware Error]: Error 0, type: fatal
[ 2.510009] [Hardware Error]: section type: unknown, 81212a96-09ed-4996-9471-8d729c8e69ed
[ 2.510009] [Hardware Error]: section length: 0xc20
[ 2.510011] [Hardware Error]: 00000000: 00000001 00000000 00000000 01003001 .............0..
[ 2.510012] [Hardware Error]: 00000010: 00000000 00000000 00000000 00000000 ................
[ 2.510013] [Hardware Error]: 00000020: 01003001 00000006 e3870663 00000009 .0......c.......
[ 2.510014] [Hardware Error]: 00000030: 00000002 00000032 090f0029 800007ff ....2...).......
[ 2.510015] [Hardware Error]: 00000040: 02002000 7f05be03 04000001 00000000 . ..............
[ 2.510016] [Hardware Error]: 00000050: 22000007 01200000 90002071 00106060 ...".. .q ..``..
[ 2.510016] [Hardware Error]: 00000060: 05000400 c7000200 ff2fff04 ef01f08b ........../.....
[ 2.510017] [Hardware Error]: 00000070: 450e2071 9e002000 00110000 00000000 q .E. ..........
[ 2.510018] [Hardware Error]: 00000080: 04000000 00000000 01500881 000ecf12 ..........P.....
[ 2.510019] [Hardware Error]: 00000090: 00035d00 00000007 5bf3f100 1020003c .].........[<. .
[ 2.510020] [Hardware Error]: 000000a0: 00000086 56000000 063717bc ff477e09 .......V..7..~G.
[ 2.510021] [Hardware Error]: 000000b0: 540000ff 00321000 03417809 29f3f91e ...T..2..xA....)
[ 2.510022] [Hardware Error]: 000000c0: 10000000 00000086 00000000 00020000 ................
[ 2.510023] [Hardware Error]: 000000d0: 00000000 ddf03900 00241cbc 0347768e .....9....$..vG.
[ 2.510024] [Hardware Error]: 000000e0: 000c0600 29c80043 00b88070 00001000 ....C..)p.......
[ 2.510024] [Hardware Error]: 000000f0: 00000000 00000080 00000000 00000000 ................
[ 2.510025] [Hardware Error]: 00000100: 00001080 00004000 00000000 0015b800 .....@..........
[ 2.510026] [Hardware Error]: 00000110: 00000002 00000009 00000080 79ff7f00 ...............y
[ 2.510027] [Hardware Error]: 00000120: c83f47fa 8200801f 37808004 79ff7fe0 .G?........7...y
[ 2.510028] [Hardware Error]: 00000130: c83f47fa 0000001f 00000000 00000000 .G?.............
[ 2.510029] [Hardware Error]: 00000140: 00000000 00000000 00000000 00000000 ................
[ 2.510030] [Hardware Error]: 00000150: 00000000 00000000 00000000 00000000 ................
[ 2.510031] [Hardware Error]: 00000160: 00000000 00000000 00000000 00000000 ................
[ 2.510031] [Hardware Error]: 00000170: 00000000 00000000 00000000 00000000 ................
[ 2.510032] [Hardware Error]: 00000180: 00000000 00000000 00000000 00000000 ................
[ 2.510033] [Hardware Error]: 00000190: 00000000 00000000 00000000 00000000 ................
[ 2.510034] [Hardware Error]: 000001a0: 00000000 00000000 00000000 00000000 ................
[ 2.510035] [Hardware Error]: 000001b0: 00000000 00000000 00000000 00000000 ................
[ 2.510035] [Hardware Error]: 000001c0: 00000000 00000000 00000000 0...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

This is the header in bert.c:
*
 * APEI Boot Error Record Table (BERT) support
 *
 * Copyright 2011 Intel Corp.
 * Author: Huang Ying <email address hidden>
 *
 * Under normal circumstances, when a hardware error occurs, the error
 * handler receives control and processes the error. This gives OSPM a
 * chance to process the error condition, report it, and optionally attempt
 * recovery. In some cases, the system is unable to process an error.
 * For example, system firmware or a management controller may choose to
 * reset the system or the system might experience an uncontrolled crash
 * or reset.The boot error source is used to report unhandled errors that
 * occurred in a previous boot. This mechanism is described in the BERT
 * table.
 *
 * For more information about BERT, please refer to ACPI Specification
 * version 4.0, section 17.3.1
 *
 * This file is licensed under GPLv2.
 *
 */

I am not sure how to decipher this but it seems like to be a hardware error. Please ask for Dell's support.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.