Unexpected system crash

Bug #1687437 reported by miko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu
Invalid
Undecided
Unassigned
openSUSE
Invalid
Undecided
Unassigned

Bug Description

System freezed with nothing responding.Couldn't change to another konsole using ctrl+alt+F1 for example.It shut down after some seconds

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: evince (not installed)
ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8
Uname: Linux 4.10.0-20-generic x86_64
ApportVersion: 2.20.4-0ubuntu4
Architecture: amd64
CurrentDesktop: KDE
Date: Mon May 1 19:23:50 2017
EcryptfsInUse: Yes
InstallationDate: Installed on 2017-04-30 (0 days ago)
InstallationMedia: Kubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
SourcePackage: evince
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
In , miko (mik3z0r) wrote :

KDE Plasma Version :5.9.4
KDE Frameworks Version 5.32.0
Qt Version 5.7.1
Kernel Version : 4.10.9-1 default

Intel Core i5-3337U
Intel HD Graphics 4000

I own a Samsung Ativ Book 9 running Opensuse Tumbleweed.
I have this problem with the random freezes when i'm watching videos.Screen,keyboard and mouse freezes and sometimes the caps lock light flashes.Not responding to terminal switch.Last 2 seconds of sound repeats forever.

Revision history for this message
In , miko (mik3z0r) wrote :
Revision history for this message
In , Tiwai-r (tiwai-r) wrote :

Could you give hwinfo output, and also the output of dmesg after the fresh boot?

The screenshots show one MCE log, and two flaky USB responses, and one harmless error message. But your reported issue sounds rather like a kernel panic, and these might be a red herring. It 'd be best if you can catch the crash via kdump...

Put Boris to Cc for the MCE issue.

Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 722365
hwinfo outpout

Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 722366
dmesg

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Takashi Iwai from comment #2)
> Could you give hwinfo output, and also the output of dmesg after the fresh
> boot?
>
> The screenshots show one MCE log, and two flaky USB responses, and one
> harmless error message. But your reported issue sounds rather like a kernel
> panic, and these might be a red herring. It 'd be best if you can catch the
> crash via kdump...
>
> Put Boris to Cc for the MCE issue.

I do not know if i am doing something wrong or just it can not be catched by kdump
I tried Intel Microcode but it keeps crashing.Memtest is clean.

Revision history for this message
In , Bpetkov (bpetkov) wrote :

And it freezes always when you watch videos?

Because the MCE is what causes the system to freeze, AFAICT:

Hardware event. This is not a software error.
CPU 0 BANK 4
TIME 1492965186 Sun Apr 23 18:33:06 2017
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal unclassified error: 402
STATUS b200000000100402 MCGSTATUS 5
CPUID Vendor Intel Family 6 Model 58
SOCKET 0 APIC 0 microcode 1c

Unfortunately, the decoder doesn't tell us a whole lot:

MCA: Internal unclassified error: 402

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Borislav Petkov from comment #6)
> And it freezes always when you watch videos?
>
> Because the MCE is what causes the system to freeze, AFAICT:
>
> Hardware event. This is not a software error.
> CPU 0 BANK 4
> TIME 1492965186 Sun Apr 23 18:33:06 2017
> MCG status:RIPV MCIP
> MCi status:
> Uncorrected error
> Error enabled
> Processor context corrupt
> MCA: Internal unclassified error: 402
> STATUS b200000000100402 MCGSTATUS 5
> CPUID Vendor Intel Family 6 Model 58
> SOCKET 0 APIC 0 microcode 1c
>
> Unfortunately, the decoder doesn't tell us a whole lot:
>
> MCA: Internal unclassified error: 402

By the time i installed iucode-tool it crashes without watching videos.Much more often

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to miko sabor from comment #7)
> By the time i installed iucode-tool it crashes without watching videos.Much
> more often

Can you make a picture of the screen right after it crashes and upload
it?

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Borislav Petkov from comment #8)
> (In reply to miko sabor from comment #7)
> > By the time i installed iucode-tool it crashes without watching videos.Much
> > more often
>
> Can you make a picture of the screen right after it crashes and upload
> it?

https://www.dropbox.com/sh/eprgx1jl71bmw4a/AAD-MgCYMMzjK18iD4nF-Ur7a?dl=0

Nothing special with the screen immediately after the freeze.Caps lock blinking and nothing reacting.Sometimes it reboots sometimes no.

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to miko sabor from comment #9)
> Nothing special with the screen immediately after the freeze.Caps lock
> blinking and nothing reacting.Sometimes it reboots sometimes no.

Ok, without changing anything else, start the machine, enter grub and
add "dis_ucode_ldr" at the kernel command line. Then boot and see if it
generates that same MCE. Do a photo again if you can't log into the box,
otherwise send dmesg again.

Thanks.

Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 722391
dmesg

Revision history for this message
In , Bpetkov (bpetkov) wrote :

Ok, can you watch videos without the freezing now? Is the box stable?

Also, please do this as root:

# modprobe msr cpuid
# zypper install msr-tools cpuid

and do

# wrmsr -a 0x8b 0
# for i in $(seq 0 $(cut -d- -f2 /sys/devices/system/cpu/online)); do cpuid -1r | grep 0x00000001; done
# rdmsr -a 0x8b

and send me that whole output.

Ask if something's not clear.

Thanks.

Revision history for this message
In , miko (mik3z0r) wrote :

linux-qf8a:/home/miko # wrmsr -a 0x8b 0
linux-qf8a:/home/miko # for i in $(seq 0 $(cut -d- -f2 /sys/devices/system/cpu/online)); do cpuid -1r | grep 0x00000001; done
   0x00000001 0x00: eax=0x000306a9 ebx=0x03100800 ecx=0x7dbae3bf edx=0xbfebfbff
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000003
   0x0000000d 0x01: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001 edx=0x28100800
   0x00000001 0x00: eax=0x000306a9 ebx=0x03100800 ecx=0x7dbae3bf edx=0xbfebfbff
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000003
   0x0000000d 0x01: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001 edx=0x28100800
   0x00000001 0x00: eax=0x000306a9 ebx=0x01100800 ecx=0x7dbae3bf edx=0xbfebfbff
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000001
   0x0000000b 0x01: eax=0x00000004 ebx=0x00000004 ecx=0x00000201 edx=0x00000001
   0x0000000d 0x01: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001 edx=0x28100800
   0x00000001 0x00: eax=0x000306a9 ebx=0x03100800 ecx=0x7dbae3bf edx=0xbfebfbff
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000003
   0x0000000d 0x01: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001 edx=0x28100800
linux-qf8a:/home/miko # rdmsr -a 0x8b
1700000000
1700000000
1700000000
1700000000

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Borislav Petkov from comment #12)
> Ok, can you watch videos without the freezing now? Is the box stable?
>
> Also, please do this as root:
>
> # modprobe msr cpuid
> # zypper install msr-tools cpuid
>
> and do
>
> # wrmsr -a 0x8b 0
> # for i in $(seq 0 $(cut -d- -f2 /sys/devices/system/cpu/online)); do cpuid
> -1r | grep 0x00000001; done
> # rdmsr -a 0x8b
>
> and send me that whole output.
>
> Ask if something's not clear.
>
> Thanks.

Forgot to mention it freezed again

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to miko sabor from comment #14)
> Forgot to mention it freezed again

even with 'dis_ucode_ldr' on the command line?

If so, then the microcode update has only influence on the frequency of
the failure.

Ok, let's try this: boot with "mce=3" on the kernel command line. Then
try to reproduce the freeze. This time the machine should hopefully not
panic and freeze but print the MCE in dmesg and we can see it at the
time it happens. And it would hopefully contain more info then.

Then, if you're still able, catch dmesg and upload it. If not, try to do
a photo.

Thanks.

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Borislav Petkov from comment #15)
> (In reply to miko sabor from comment #14)
> > Forgot to mention it freezed again
>
> even with 'dis_ucode_ldr' on the command line?
>
> If so, then the microcode update has only influence on the frequency of
> the failure.
>
> Ok, let's try this: boot with "mce=3" on the kernel command line. Then
> try to reproduce the freeze. This time the machine should hopefully not
> panic and freeze but print the MCE in dmesg and we can see it at the
> time it happens. And it would hopefully contain more info then.
>
> Then, if you're still able, catch dmesg and upload it. If not, try to do
> a photo.
>
> Thanks.

https://www.dropbox.com/s/wlqjkvtt1xzpavv/DSC_0015.JPG?dl=0
That is the error i got this time with the "dis_ucode_ldr"
i will now proceed to the mce=3

Revision history for this message
In , miko (mik3z0r) wrote :

Again MCE error.What drives me crazy is that it works perfect with Windows OS

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to miko sabor from comment #17)
> Again MCE error.

with "mce=3"?

Can you upload full dmesg with it?

> What drives me crazy is that it works perfect with Windows OS

Well, it could be that the GPU driver is causing those and the windoze
GPU driver doesn't. But we still can't pinpoint with certainty which hw
part causes it so the GPU driver is just a conjecture. But since you
said it happens while watching videos then the GPU looks likely...

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Borislav Petkov from comment #18)
> (In reply to miko sabor from comment #17)
> > Again MCE error.
>
> with "mce=3"?
>
> Can you upload full dmesg with it?
>
> > What drives me crazy is that it works perfect with Windows OS
>
> Well, it could be that the GPU driver is causing those and the windoze
> GPU driver doesn't. But we still can't pinpoint with certainty which hw
> part causes it so the GPU driver is just a conjecture. But since you
> said it happens while watching videos then the GPU looks likely...

I will try to catch the error to the dmesg next time it will freeze.

One of my thoughts is..what if the microcode is not "booted" in the startup.Maybe i need to add something on the kernel command line parameter to start the intel-ucode.You previously mentioned this "dis_ucode_ldr".Is there any intel specific one?

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to miko sabor from comment #19)
> One of my thoughts is..what if the microcode is not "booted" in the
> startup.

We've seen that the microcode doesn't have any influence on this as you
see the error regardless whether microcode was updated or not.

Make sure to boot with "mce=3" and catch dmesg after the error happens.

Thanks.

Revision history for this message
In , Tiwai-r (tiwai-r) wrote :

Just to be sure: try to trigger the crash manually to see whether your kdump setup really works, e.g. "echo c > /proc/sysrq-trigger"

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to Takashi Iwai from comment #21)
> Just to be sure: try to trigger the crash manually to see whether your kdump
> setup really works, e.g. "echo c > /proc/sysrq-trigger"

No, please don't confuse him. The mce=3 setting should not panic the machine and I want to see the full MCE right after it got logged, not after a reboot.

Thanks.

Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 723218
dmesg

My dmesg is clean.The system freezes completely and i have to restart it by using the power button.

Revision history for this message
In , Bpetkov (bpetkov) wrote :

(In reply to miko sabor from comment #23)
> The system freezes completely and i have to restart it by
> using the power button.

Even with mce=3 ?

Revision history for this message
In , miko (mik3z0r) wrote :

(In reply to Borislav Petkov from comment #24)
> (In reply to miko sabor from comment #23)
> > The system freezes completely and i have to restart it by
> > using the power button.
>
> Even with mce=3 ?

Yes.

Revision history for this message
In , Bpetkov (bpetkov) wrote :

Hmmkay. Looks like the severity grading says this error is fatal and we
panic. Ok, please try this:

1. Boot with "mce=3"

2. As root do:

# mount -t debugfs none /sys/kernel/debug
# echo 1 > /sys/kernel/debug/mce/fake_panic
# cat /sys/kernel/debug/mce/fake_panic
1

That last command verifies it accepted the setting.

3. Now, try to reproduce the MCE and keep watching dmesg in another
window:

# while true; do dmesg | tail -n 20 ; sleep 2s; done

The moment you see the MCE, you do dmesg > log and upload that "log"
file.

The idea is to fake-panic the machine to be able to catch dmesg and send
it out.

Thanks.

Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 723247
dmesg

made a completely fresh install on my system again on tumbleweed.booted with mce=3 and that's what i got

Revision history for this message
In , Bpetkov (bpetkov) wrote :

That's no different from before. Please do the steps in comment #26
instead. It is important to try to reproduce it and catch it in dmesg
*before* rebooting.

My hope is that the MCE reported at that time would have more
information as to pinpoint which device on your system is causing it.

Revision history for this message
In , miko (mik3z0r) wrote :

Forgive me but i can not continue with this.It is getting to complicate for me!Thank you for the help until now !

Revision history for this message
miko (mik3z0r) wrote :
Revision history for this message
In , Bpetkov (bpetkov) wrote :

Sorry to hear that. But ok, let me know if you change your mind. Closing for now.

Revision history for this message
miko (mik3z0r) wrote :
Revision history for this message
miko (mik3z0r) wrote :
Revision history for this message
miko (mik3z0r) wrote :
miko (mik3z0r)
affects: evince (Ubuntu) → ubuntu
affects: evince (openSUSE) → opensuse
Changed in opensuse:
importance: Unknown → Critical
Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 723558
dmesg from kubuntu

Revision history for this message
In , miko (mik3z0r) wrote :

https://www.dropbox.com/s/jyx31cdif0v7e3a/dump.201705030323?dl=0
that's the kdump file generated together with the dmesg right after the kernel panic on kubuntu.hope it can provide you with some useful information

Revision history for this message
In , Bpetkov (bpetkov) wrote :

Hmm, so I might be doing something wrong but the crash I have in my
ubuntu guest here doesn't like your dump:

# crash --kaslr 0x2ec00000 kubuntu-dbg/usr/lib/debug/boot/vmlinux-4.10.0-20-generic dump.201705030323

...

crash: cannot determine thread return address
please wait... (gathering task table data)
crash: invalid kernel virtual address: 200008 type: "fill_thread_info"

crash: invalid structure member offset: thread_info_cpu
       FILE: task.c LINE: 2357 FUNCTION: store_context()

[/usr/bin/crash] error trace: 4c7cde => 4c26b5 => 52e3e7 => 52e35c

  52e35c: (undetermined)
  52e3e7: OFFSET_verify+55
  4c26b5: (undetermined)
  4c7cde: (undetermined)

Can you open that dump?

It fails with and without the --kaslr offset.

Takashi, anything I'm missing?

Thanks.

Revision history for this message
In , Tiwai-r (tiwai-r) wrote :

(In reply to Borislav Petkov from comment #33)
> Takashi, anything I'm missing?

I also don't know well about Ubuntu kernel, sorry.

Revision history for this message
miko (mik3z0r) wrote :

maybe that?

Revision history for this message
In , miko (mik3z0r) wrote :

Created attachment 723667
maybe that?

maybe that?

Revision history for this message
In , Bpetkov (bpetkov) wrote :

No, that's some ubuntu apport-<something> file which can regenerate the
stack trace but I'd need the full environment for that and so on and so
on.

So instead please consider doing comment #26.

Changed in opensuse:
status: Unknown → Confirmed
Revision history for this message
In , Jslaby-h (jslaby-h) wrote :

Closing due to lack of response.

Changed in opensuse:
status: Confirmed → Unknown
Revision history for this message
dino99 (9d9) wrote :

Closed non response

Changed in opensuse:
importance: Critical → Undecided
status: Unknown → New
Changed in ubuntu:
status: New → Invalid
Changed in opensuse:
status: New → Invalid
Revision history for this message
dino99 (9d9) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.