frequent lockups asus g705js/GeForce GTX 870M

Bug #1438708 reported by Geoff Williams on 2015-03-31
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Trusty
Medium
Unassigned
Utopic
Medium
Unassigned

Bug Description

I get hard lockups with Linux kernel 3.16 every 5 - 15 minutes. Attempted upgrading to (x)ubuntu 14.10 to see if this improved stability and laptop crashed so badly during the upgrade that it broke grub and had to be recovered with ubuntu boot-repair. This allowed the system to boot and I was able to complete configuration with dkpg --configure -a --pending

System still crashes every few minutes when running (x)ubuntu 14.10 and about 50% of the time crashes before even reaching the login screen. If I'm lucky attempting to swich VCs will resolve things but the system always crashes eventually and has to be powered off by holding the power button for 10 seconds or via the magic sysreq key. No other key cominations work.

WORKAROUND: Boot into kernel 3.13.0-44-generic.

WORKAROUND: Remove nouveau.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: linux-image-3.16.0-33-generic 3.16.0-33.44
ProcVersionSignature: Ubuntu 3.16.0-33.44-generic 3.16.7-ckt7
Uname: Linux 3.16.0-33-generic x86_64
NonfreeKernelModules: wl
ApportVersion: 2.14.7-0ubuntu8.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: geoff 3257 F.... pulseaudio
CurrentDesktop: XFCE
Date: Wed Apr 1 01:05:31 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-09-23 (189 days ago)
InstallationMedia: Xubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
MachineType: ASUSTeK COMPUTER INC. G750JS
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-33-generic.efi.signed root=UUID=3fc94323-d40a-42ca-87c7-28a9e52ebcf4 ro recovery nomodeset
RelatedPackageVersions:
 linux-restricted-modules-3.16.0-33-generic N/A
 linux-backports-modules-3.16.0-33-generic N/A
 linux-firmware 1.138.1
SourcePackage: linux
UpgradeStatus: Upgraded to utopic on 2015-03-29 (2 days ago)
dmi.bios.date: 07/17/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: G750JS.208
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: G750JS
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK COMPUTER INC.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrG750JS.208:bd07/17/2014:svnASUSTeKCOMPUTERINC.:pnG750JS:pvr1.0:rvnASUSTeKCOMPUTERINC.:rnG750JS:rvr1.0:cvnASUSTeKCOMPUTERINC.:ct10:cvr1.0:
dmi.product.name: G750JS
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK COMPUTER INC.

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: trusty
Changed in linux (Ubuntu):
importance: Undecided → Critical
tags: added: kernel-graphics

Using the recovery mode (https://wiki.ubuntu.com/RecoveryMode), please try to install the kernel mainline build (https://wiki.ubuntu.com/Kernel/MainlineBuilds) and test if it works.

Then tell the result here, and set this bug status back to "confirmed". Thank you.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete

Hi Alberto,
I skipped recovery mode and installed the kernels directly from my functioning 3.13 system:

ii linux-image-3.17.1-031701-generic 3.17.1-031701.201410150735
ii linux-image-3.18.0-031800rc2-generic 3.18.0-031800rc2.201410262035

I was unable to boot 3.18 due to it not finding the root device

3.17 looked like it worked so I immediately shutdown from the login screen and it then crashed so hard that magic sysreq didn't even work - I had to hold the power button.

Interestingly, last night after running linux 3.13 for a while I rebooted into linux 3.16 via recovery mode, where I waited for approximately 1 minute before booting normally and the system DID seem stable overnight. This morning, I powered the laptop off and attempted to boot 3.16 and it crashed during the boot process so the error is still reproducible. I'll leave 3.17 running and see if it crashes once I'm logged in

Geoff Williams, could you please test the latest upstream kernel available from the very top line at the top of the page (the release names are irrelevant for testing, and please do not test the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue.

If the test did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-3.XY-rcZ

Where XY and Z are numbers corresponding to the kernel version.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-3.XY-rcZ

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results.

Thank you for your understanding.

tags: added: latest-bios-208
tags: added: regression-release
removed: trusty
Changed in linux (Ubuntu):
importance: Critical → High

@ Christopher M. Penalver:

Geoff Williams has already tested the upstream kernel.

tags: added: kernel-bug-exists-upstream-3.18-rc2

@ Geoff Williams

Please:
  1. Report this bug to <https://bugzilla.kernel.org/>.
  2. Paste the new report URL here.
  3. Set this bug status back to "confirmed".

Thank you.

tags: added: asked-to-upstream

Alberto Salvia Novella, thanks for your help.

Unfortunately, 3.18.x is painfully old, and not the latest mainline kernel.

As well, reporting to bugzilla is legacy (as well as premature), as it is not the preferred method to report bugs to upstream, and the latest mainline kernel has not been tested yet (as requested by upstream).

tags: added: needs-upstream-testing
removed: asked-to-upstream

@ Christopher M. Penalver

If I visit <http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D>, the latest kernel I can see published for Utopic is 3.18-rc2.

Alberto Salvia Novella, the latest mainline kernel at the top of the page is 4.0-rc6 (not 3.18.x).

Download full text (4.2 KiB)

Alberto,

Appologies for the delay in responding. Good news - I've been able to boot linux 4.0-rc6 and the system is now stable.

I used the package linux-image-4.0.0-040000rc6-generic for Vivid from the from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc6-vivid/

There are still a bunch of nouveau errors (below) visible in dmesg but the system does NOT crash anymore

[ 0.695552] nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x0e4190a2
[ 0.695556] nouveau [ DEVICE][0000:01:00.0] Chipset: GK104 (NVE4)
[ 0.695558] nouveau [ DEVICE][0000:01:00.0] Family : NVE0
[ 0.762806] nouveau [ VBIOS][0000:01:00.0] using image from PROM
[ 0.762895] nouveau [ VBIOS][0000:01:00.0] BIT signature found
[ 0.762899] nouveau [ VBIOS][0000:01:00.0] version 80.04.f4.00.01
[ 0.763651] nouveau [ DEVINIT][0000:01:00.0] adaptor not initialised
[ 0.763681] nouveau [ VBIOS][0000:01:00.0] running init tables
[ 1.025922] nouveau [ PMC][0000:01:00.0] MSI interrupts enabled
[ 1.025982] nouveau [ PFB][0000:01:00.0] RAM type: GDDR5
[ 1.025984] nouveau [ PFB][0000:01:00.0] RAM size: 3072 MiB
[ 1.025987] nouveau [ PFB][0000:01:00.0] ZCOMP: 0 tags
[ 1.028427] nouveau [ VOLT][0000:01:00.0] GPU voltage: 600000uv
[ 1.076762] nouveau [ PTHERM][0000:01:00.0] FAN control: none / external
[ 1.076773] nouveau [ PTHERM][0000:01:00.0] fan management: automatic
[ 1.076789] nouveau [ PTHERM][0000:01:00.0] internal sensor: yes
[ 1.076829] nouveau [ CLK][0000:01:00.0] 07: core 324-405 MHz memory 648 MHz
[ 1.076865] nouveau [ CLK][0000:01:00.0] 0a: core 405-967 MHz memory 1620 MHz
[ 1.076910] nouveau [ CLK][0000:01:00.0] 0e: core 405-967 MHz memory 4000 MHz
[ 1.076963] nouveau [ CLK][0000:01:00.0] 0f: core 405-967 MHz memory 5000 MHz
[ 1.077034] nouveau [ CLK][0000:01:00.0] --: core 324 MHz memory 648 MHz
[ 1.110733] nouveau E[ PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x500c30 [ IBUS ]
[ 1.110882] nouveau [ DRM] VRAM: 3072 MiB
[ 1.110884] nouveau [ DRM] GART: 1048576 MiB
[ 1.110888] nouveau [ DRM] TMDS table version 2.0
[ 1.110891] nouveau [ DRM] DCB version 4.0
[ 1.110893] nouveau [ DRM] DCB conn 08: 00020846
[ 1.110896] nouveau [ DRM] DCB conn 09: 00000900
[ 1.117043] nouveau [ DRM] MM: using COPY for buffer copies
[ 1.117049] [drm] Initialized nouveau 1.2.1 20120801 for 0000:01:00.0 on minor 0
[ 11.474194] nouveau E[ PGR][0000:01:00.0] HUB_INIT timed out
[ 11.474201] nouveau E[ PGR][0000:01:00.0] 409000 - done 0x00000240
[ 11.474206] nouveau E[ PGR][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
[ 11.474210] nouveau E[ PGR][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000006 0x00000000
[ 11.474213] nouveau E[ PGR][0000:01:00.0] 502000 - done 0x00000340
[ 11.474218] nouveau E[ PGR][0000:01:00.0] 502000 - stat 0x80000000 0x00006500 0x00000000 0x00000000
[ 11.474224] nouveau E[ PGR][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000002 0x00000000
[ 11.474227] nouveau E[ PGR][0000:01:00....

Read more...

Download full text (3.9 KiB)

Relevant portion of syslog from a crash on Linux 3.17:

Apr 1 09:49:28 monster kernel: [ 10.722076] nouveau E[ PGRAPH][0000:01:00.0] HUB_INIT timed out
Apr 1 09:49:28 monster kernel: [ 10.722082] nouveau E[ PGRAPH][0000:01:00.0] 409000 - done 0x00000240
Apr 1 09:49:28 monster kernel: [ 10.722087] nouveau E[ PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722091] nouveau E[ PGRAPH][0000:01:00.0] 409000 - stat 0x00000000 0x00000000 0x00000006 0x00000001
Apr 1 09:49:28 monster kernel: [ 10.722094] nouveau E[ PGRAPH][0000:01:00.0] 502000 - done 0x00000300
Apr 1 09:49:28 monster kernel: [ 10.722100] nouveau E[ PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00008c00 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722106] nouveau E[ PGRAPH][0000:01:00.0] 502000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722108] nouveau E[ PGRAPH][0000:01:00.0] 50a000 - done 0x00000300
Apr 1 09:49:28 monster kernel: [ 10.722114] nouveau E[ PGRAPH][0000:01:00.0] 50a000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722120] nouveau E[ PGRAPH][0000:01:00.0] 50a000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722122] nouveau E[ PGRAPH][0000:01:00.0] 512000 - done 0x00000300
Apr 1 09:49:28 monster kernel: [ 10.722128] nouveau E[ PGRAPH][0000:01:00.0] 512000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722134] nouveau E[ PGRAPH][0000:01:00.0] 512000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722136] nouveau E[ PGRAPH][0000:01:00.0] 51a000 - done 0x00000300
Apr 1 09:49:28 monster kernel: [ 10.722142] nouveau E[ PGRAPH][0000:01:00.0] 51a000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722148] nouveau E[ PGRAPH][0000:01:00.0] 51a000 - stat 0x00000000 0x00000000 0x00000000 0x00000000
Apr 1 09:49:28 monster kernel: [ 10.722149] nouveau E[ PGRAPH][0000:01:00.0] init failed, -16
Apr 1 09:49:32 monster NetworkManager[930]: <info> WiFi hardware radio set enabled
Apr 1 09:49:35 monster kernel: [ 17.847450] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140724/nsarguments-95)
Apr 1 09:49:35 monster kernel: [ 17.847791] ACPI: \_SB_.PCI0.PEG0.PEGP: failed to evaluate _DSM
Apr 1 09:49:35 monster kernel: [ 17.847795] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20140724/nsarguments-95)
Apr 1 09:49:35 monster crontab[2536]: (root) LIST (root)
Apr 1 09:49:36 monster ntpd[2608]: ntpd 4.2.6p5@1.2349-o Fri Feb 6 15:24:12 UTC 2015 (1)
Apr 1 09:49:36 monster ntpd[2609]: proto: precision = 0.131 usec
Apr 1 09:49:36 monster ntpd[2609]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
Apr 1 09:49:36 monster ntpd[2609]: unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING
Apr 1 09:49:36 monster puppet-agent[...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Download full text (5.0 KiB)

Ok - looks like I spoke too soon. On shutting down after running all day with no problems (largely unattended) the laptop crashed hard after trying to shutdown.

I found this in syslog after rebooting -- this is Linux 4.0:

Apr 2 20:30:07 monster kernel: [ 69.841098] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150204/nsarguments-95)
Apr 2 20:30:07 monster kernel: [ 69.841423] ACPI: \_SB_.PCI0.PEG0.PEGP: failed to evaluate _DSM
Apr 2 20:30:07 monster kernel: [ 69.841428] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150204/nsarguments-95)
Apr 2 20:30:22 monster kernel: [ 84.853086] nouveau E[ DRM] failed to idle channel 0xcccc0001 [DRM]
Apr 2 20:30:22 monster kernel: [ 84.853284] ------------[ cut here ]------------
Apr 2 20:30:22 monster kernel: [ 84.853290] WARNING: CPU: 4 PID: 206 at /home/kernel/COD/linux/drivers/pci/pci.c:1546 pci_disable_device+0xab/0xc0()
Apr 2 20:30:22 monster kernel: [ 84.853292] nouveau 0000:01:00.0: disabling already-disabled device
Apr 2 20:30:22 monster kernel: [ 84.853293] Modules linked in: dm_crypt bnep rfcomm binfmt_misc asus_nb_wmi uvcvideo asus_wmi sparse_keymap snd_hda_codec_hdmi videobuf2_vmalloc videobuf2_memops videobuf2_core intel_rapl v4l2_common iosf_mbi snd_hda_codec_realtek x86_pkg_temp_thermal videodev intel_powerclamp snd_hda_codec_generic coretemp btusb media bluetooth kvm_intel snd_hda_intel snd_hda_controller kvm snd_hda_codec snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul snd_seq_midi snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_seq snd_seq_device snd_timer joydev serio_raw snd ie31200_edac mei_me mei shpchp soundcore lpc_ich edac_core mac_hid parport_pc ppdev lp nls_iso8859_1 parport nouveau i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper drm psmouse ahci alx libahci mdio wmi video
Apr 2 20:30:22 monster kernel: [ 84.853336] CPU: 4 PID: 206 Comm: kworker/4:1 Not tainted 4.0.0-040000rc6-generic #201503291935
Apr 2 20:30:22 monster kernel: [ 84.853337] Hardware name: ASUSTeK COMPUTER INC. G750JS/G750JS, BIOS G750JS.208 07/17/2014
Apr 2 20:30:22 monster kernel: [ 84.853341] Workqueue: pm pm_runtime_work
Apr 2 20:30:22 monster kernel: [ 84.853342] 000000000000060a ffff880465403b88 ffffffff817e3106 0000000000000007
Apr 2 20:30:22 monster kernel: [ 84.853345] ffff880465403bd8 ffff880465403bc8 ffffffff810791b7 ffff880465403be8
Apr 2 20:30:22 monster kernel: [ 84.853346] ffff88046c366000 ffff88046c366000 ffff880035cd8800 00000000fffffff0
Apr 2 20:30:22 monster kernel: [ 84.853348] Call Trace:
Apr 2 20:30:22 monster kernel: [ 84.853353] [<ffffffff817e3106>] dump_stack+0x45/0x57
Apr 2 20:30:22 monster kernel: [ 84.853357] [<ffffffff810791b7>] warn_slowpath_common+0x97/0xe0
Apr 2 20:30:22 monster kernel: [ 84.853359] [<ffffffff810792b6>] warn_slowpath_fmt+0x46/0x50
Apr 2 20:30:22 monster kernel: [ 84.853363] [<ffffffff8140b5c0>] ? pci_save_vc_state+0x40/0x100
Apr 2 20:30:22 monster kernel: [ 84.853368] [<ffffffff8140180b>] pci...

Read more...

tags: added: kernel-bug-exists-upstream

Geoff Williams, the next step is to fully commit bisect from kernel 3.13 to 3.16 in order to identify the last good kernel commit, followed immediately by the first bad one. This will allow for a more expedited analysis of the root cause of your issue. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ?

Please note, finding adjacent kernel versions is not fully commit bisecting.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: kernel-bug-exists-upstream-4.0-rc6 needs-bisect
removed: kernel-bug-exists-upstream-3.18-rc2 needs-upstream-testing
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Incomplete

Hi Christopher,
I've tried the available pre-built packages but there are graps between the binaries and upstream. The early 3.16.0 kernels work for me but with no graphics acceleration, which I think is why they don't crash. It looks like the later builds switch acceleration back on and thats when the problems occur.

So I think the *real* problem must lie somewhere between 3.13.0-44 and the version of the kernel before 3.16.0-23 that still had graphics enabled.

Here are the results I've come up with so far:
linux-image-3.13.0-44 (trusty) - OK
linux-image-3.13.0-44 (trusty) - OK
linux-image-3.16.0-23 - Doesn't crash but seems to be using framebuffer video + no sound
linux-image-3.16.0-28 - Doesn't crash but seems to be using framebuffer video + no sound
linux-image-3.16.0-33 - accelerated graphics working but crashes
linux-image-4.0.0-rc6 - seems mostly working but crashes on shutdown (observed twice)

Hi Christopher,

I've tested a few more kernels by compiling the source packages linked from the kernel bisection page. There is NO clear breakage that I'm able to find, instead I see this:

linux-image-3.13.0.48.80 - OK (last fully functional kernel) (self-compiled)
linux-image-3.13.0.49.81 - fallback graphics (doesn't seem to crash)
...
linux-image-3.16.0-32.42 - fallback graphics (doesn't seem to crash) (self-compiled)
linux-image-3.16.0-33.44 - accelerated graphics (crashes)
linux-image-3.16.0-34.45 - fallback graphics (doesn't seem to crash
...
linux-image-4.0.0-040000rc6.201503291935 - accelerated graphics (crashes on shutdown)

I self-complied linux-image-3.15.0-1.3 and linux-image-3.15.0-4.8 and these both had fallback graphics and seemed to be stable.

Thanks,
Geoff

Thanks,
Geoff

As I was unable to usefully bisect the ubuntu kernel, I tried again with the mainline kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/

This gave a much clearer result:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.37-utopic/ -- works
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-rc1-trusty/ -- crashes on bootup

I was also able to reproduce a crash on bootup with:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-rc4-utopic/
-and-
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-utopic/

Thanks,
Geoff

Update: Not sure if anyone other then me was affected by this - in the end I was able resolve things and get a 100% usable system by removing the nouveau driver - cheers!

no longer affects: linux (Ubuntu)
affects: linux → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
description: updated
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Changed in linux (Ubuntu Utopic):
importance: Undecided → Medium
Rolf Leggewie (r0lf) wrote :

utopic has seen the end of its life and is no longer receiving any updates. Marking the utopic task for this ticket as "Won't Fix".

Changed in linux (Ubuntu Utopic):
status: New → Won't Fix
To post a comment you must log in.