Total system freeze after 3.16.0-38.52

Bug #1561902 reported by Isak Frants
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-lts-utopic (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This should have been reported already last year, but here it goes.

Problem:
My laptop is a ASUS N71JA and works perfectly in Windows. Everything was fine with Xubuntu 14.04 through kernels 3.13 and 3.16 up to 3.16.0-38.52. Desktop freezes, like totally dies with hard power reset being the only option, every now and then from this kernel and onwards. This affects 3.19 and 4.2.0 kernels released after this date as well. No Magic SysRq possible and I can't see anything fancy in syslog or kern.log. Desktop can die anytime between immediately after boot or after a few hours, but it happens completely randomly and is hard to debug.

Workaround:
3.16.0-38.52 was released in May 2015 (https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1452882).
3.16.0-37.51 and 3.16.0-36.48 have NEVER frozen.

The only non-upstream change that was made to 3.16.0-38.52 was "vesafb: Set mtrr:3 (write-combining) as default" (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1434581). This is probably not implemented in the mainline kernel tree, right? I've been using mainline 3.16.7-ckt24-trusty without any problems now for over a month (http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16.7-ckt24-trusty/) i.e. it seems like 1434581 is causing the problem.

Can this be verified by some other method e.g. by disabling/reverting 1434581 in a "normal" non-mainline kernel?

Solution:
?
---
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC2: isak 1669 F.... pulseaudio
 /dev/snd/controlC0: isak 1669 F.... pulseaudio
 /dev/snd/controlC1: isak 1669 F.... pulseaudio
CurrentDesktop: XFCE
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=188824e0-028f-4004-92b9-9e627eff4d5c
MachineType: ASUSTeK Computer Inc. N71Ja
Package: linux (not installed)
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-38-generic root=UUID=1d243640-e961-4ee3-b6d9-8e4ee5816090 ro quiet splash
ProcVersionSignature: Ubuntu 3.16.0-38.52~14.04.1-generic 3.16.7-ckt10
RelatedPackageVersions:
 linux-restricted-modules-3.16.0-38-generic N/A
 linux-backports-modules-3.16.0-38-generic N/A
 linux-firmware 1.127.20
Tags: trusty
Uname: Linux 3.16.0-38-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo vboxusers
_MarkForUpload: True
dmi.bios.date: 05/14/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: N71Ja.206
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: N71Ja
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrN71Ja.206:bd05/14/2010:svnASUSTeKComputerInc.:pnN71Ja:pvr1.0:rvnASUSTeKComputerInc.:rnN71Ja:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0:
dmi.product.name: N71Ja
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.

Revision history for this message
Isak Frants (isakfrants) wrote :

I'm unable to run 'apport-collect 1561902', as it says "Package linux-lts-utopic not installed and no hook available, ignoring". Any help?

Revision history for this message
Isak Frants (isakfrants) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected trusty
description: updated
Revision history for this message
Isak Frants (isakfrants) wrote : BootDmesg.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : CRDA.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : IwConfig.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : Lspci.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : Lsusb.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : ProcEnviron.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : ProcModules.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : PulseList.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : RfKill.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : UdevDb.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : UdevLog.txt

apport information

Revision history for this message
Isak Frants (isakfrants) wrote : WifiSyslog.txt

apport information

Revision history for this message
Stefan Bader (smb) wrote :

Odd, in theory that change should only affect the MTRR cache mode of the video memory for the generic VESA framebuffer driver. And while that is used initially it gets replaced later on by the special driver for radeon.
Maybe two things: First, after completing boot into an affected kernel, could you do a "cat /proc/mtrr" and add the output of that to this bug report.
Second, looking at the documentation it looks like it should be possible to disable that caching mode for vesafb with a kernel command-line option. If you look at /etc/default/grub, there should be a GRUB_CMDLINE_LINUX_DEFAULT (ususally it has "quiet splash" there). If you add a "video=vesafb:mtrr:0" there. Then run "sudo update-grub" and reboot. After reboot this should show up in the output of "cat /proc/cmdline" and in theory should make any "write-combining" that may have been in "cat /proc/mtrr" before go away.
Just as an additional note, looking at the dmesg output in this report, there seems to be something odd in the way the BIOS initializes MTRR. This might be the reason this specific laptop suffers from the change.

Revision history for this message
Isak Frants (isakfrants) wrote :

Hi! Thanks for your time. I really appreciate it. Attached cat_proc_mtrr shows 'cat /proc/mtrr' directly after boot for
- kernel 3.16.0-37 (not affected kernel)
- kernel 3.16.0-38 (affected kernel)
- kernel 3.16.0-38 + video=vesafb:mtrr:0 (affected + possible workaround)

You were right, the write-combining disappeared with vesafb:mtrr:0. Sadly, 3.16.0-38 still freezes within minutes.

What else can be tested / what more can I provide?

Revision history for this message
Stefan Bader (smb) wrote :

So the change of cache mode for the VESA frambuffer driver is at least not the only offender. It might even have no impact at all but I would not be 100% sure, yet. Normally this sounds like something that needs a bisect (building various kernels at stages between 3.16.0-37 and 38 which you would have to try and report whether they work or not). Though that is quite tedious.

Some things that you might try before:
- try the latest 3.16 (*-87*) with the mtrr option
- try to switch to a text console (ctrl-alt-f1) and try to cause some load (scp or wget files, ssh,...)
- if you got another Linux machine, try to install openssh-server on the laptop and attempt to
  ssh into the laptop when it appears frozen (sometimes it is just the graphical frontend and
  keyboard/mouse events that lock up).

The other question would be whether it happens also if you just log in an let it sit idle (provided the screen saver does not kick in) or does it only happen if you do something (and if that is true, would this be things that cause network activity or rather local activity)?

Revision history for this message
Isak Frants (isakfrants) wrote :

Thanks for the ideas! Did you mean 3.16.0-67 and not -87? I tried -67 now with and without vesafb:mtrr:0. System freezes within minutes in both cases.

I also tried mainline 3.16.7-ckt26-trusty and added vesafb:mtrr:3 to this and the system does NOT seem to freeze. This bug is weird. 3.16.0-38 was the first kernel with problems and mtrr was the only change that is not included in mainline. Why are mainline kernels working and normal not...? I want to believe that setting vesafb:mtrr:3 as boot parameter is exactly the same as setting mtrr = 3 in source code. (as said here https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1434581/comments/48)

No ctrl-alt-f1 or alt-ctrl-sysrq is possible when the system has frozen. Nothing works. Not even the WiFi LED from the hardware switch. I started a "find *.* /" after 3.16.0-67 boot and noticed that all disc activity stopped when system freezes. That probably tells that this is not just keyboard/mouse/graphics issue?

Revision history for this message
Isak Frants (isakfrants) wrote :

Oh, yes. We freeze completely at random. It can freeze at idle after simply booting, when clicking a link in firefox or when just moving windows around. Seems very random :/

Revision history for this message
Stefan Bader (smb) wrote :

Ok, it sounds more like there could be a crash behind the scene and everything stops working. On the text console bit I was a bit unclear. What I meant was to switch to text right from the start and do the login on the text console and also play around while being in text mode. If it crashes then, there is a better chance that you see what is crashing. If it does not crash or lock up it might be a hint that the bug might be in the graphics driver stack.

Revision history for this message
Isak Frants (isakfrants) wrote :

Hi again. I'm continously trying different scenarios, with and without text mode, different kernels, wifi on or off, but I can't say for sure what's going on. A daily image (from 15.4) of Xubuntu Xenial froze after about an hour of usage and after just a couple of minutes the following boot. This weekend I've been using 3.16.0-67 kernel and have not seen a single freeze. That is so weird. Two weeks ago it froze within minutes with that kernel and the exact same hardware and everything.

Revision history for this message
Stefan Bader (smb) wrote :

Sounds more and more odd. Maybe things are related on how long the system is powered on. It has been a while, but I had that once. Random freezes but rather shortly after cold boot. Though that was a desktop and back then it was the connectors of the power supply not all fitted firmly. So depending on heating up the connection it would handle higher currents better or worse.

On the other hand you wrote that on Xenial this happened after one hour and immediately after. And by that time things should be warmed up enough... The problem of course is that without a repeatable method to trigger the problem or by luck get some error message, this will be near impossible to track down. The only thing I can think of is to try to remember whether there could be a pattern in what is being done primarily when lockups happen.

Revision history for this message
Isak Frants (isakfrants) wrote :

Windows 10 is freezing in the exact same way. I guess something is broken in this laptop and I'll keep using 3.13.x now and Windows 7 until 2019. If the laptop survives that far :) thanks for the help Stefan!

Changed in linux-lts-utopic (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.