System soft-freezes, BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 in journalctl at gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250

Bug #1677673 reported by bp
168
This bug affects 35 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I just upgraded to 17.04. It appears that now leaving the system locked for long enough now makes it unable to respond to any input but SysRq.

This apepars to be the only relevant bit in `journalctl --boot -1`:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915]
PGD 0

Oops: 0002 [#1] SMP
Modules linked in: ccm pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ip snd_soc_ssm4567 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core cfg80211 snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel lpc_ich snd_seq_ ghash_clmulni_intel pcbc i915 aesni_intel aes_x86_64 crypto_simd glue_helper cryptd sdhci_pci i2c_algo_bit drm_kms_helper syscopyarea psmouse ahci sysfillrect libahci e100
CPU: 3 PID: 3826 Comm: chrome Tainted: G OE 4.10.0-14-generic #16-Ubuntu
Hardware name: Dell Inc. Latitude E7250/0V8RX3, BIOS A09 11/18/2015
task: ffff99ddb372da00 task.stack: ffffb0ec037dc000
RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915]
RSP: 0018:ffffb0ec037df880 EFLAGS: 00010246
RAX: ffff99dd0367b880 RBX: 0000000000000003 RCX: 0000000000000003
RDX: 0000000000000000 RSI: ffff99ddb737e000 RDI: ffff99de4ad90000
RBP: ffffb0ec037df8d8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff99ddc994a000
R13: ffff99dcd02e2bf0 R14: 00000000fffdf000 R15: 0000000000008000
FS: 00007fd3a0fa7b00(0000) GS:ffff99de5e580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000174607000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 gen8_alloc_va_range_3lvl+0xfb/0x9e0 [i915]
 ? __alloc_pages_nodemask+0xff/0x260
 gen8_alloc_va_range+0x23d/0x470 [i915]
 i915_vma_bind+0x7e/0x170 [i915]
 __i915_vma_do_pin+0x2a5/0x450 [i915]
 i915_gem_execbuffer_reserve_vma.isra.31+0x144/0x1b0 [i915]
 i915_gem_execbuffer_reserve.isra.32+0x39e/0x3d0 [i915]
 i915_gem_do_execbuffer.isra.38+0x4a2/0x1750 [i915]
 ? radix_tree_lookup_slot+0x22/0x50
 ? shmem_getpage_gfp+0xf9/0xc10
 i915_gem_execbuffer2+0xa1/0x1e0 [i915]
 drm_ioctl+0x21b/0x4c0 [drm]
 ? i915_gem_execbuffer+0x310/0x310 [i915]
 ? __seccomp_filter+0x67/0x250
 do_vfs_ioctl+0xa3/0x610
 ? __secure_computing+0x3f/0xd0
 ? syscall_trace_enter+0xcd/0x2e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x5b/0xc0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fd39a33a907
RSP: 002b:00007ffff6323818 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000018a20a505000 RCX: 00007fd39a33a907
RDX: 00007ffff6323860 RSI: 00000000c0406469 RDI: 000000000000000e
RBP: 00007ffff6323860 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000038 R11: 0000000000000246 R12: 00000000c0406469
R13: 000000000000000e R14: 0000000000000000 R15: 0000000000000000
Code: e6 48 8b 90 20 03 00 00 48 8b b8 d8 02 00 00 48 8b 52 08 48 83 ca 03 e8 ca cd ff ff 48 8b 45 b0 48 8b 4d c8 48 8b 10 48 8b 45 d0 <4c> 89 24 ca 48 0f ab 08 0f 1f 44 00
RIP: gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915] RSP: ffffb0ec037df880
CR2: 0000000000000018
---[ end trace 2a4103476767c23b ]---

I walked to my computer 12 minutes later, finding it soft-locked as described above.

I realize that this appears somewhat far removed from the real cause of the problem, so please let me know what other data I can provide to facilitate the debugging process.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-14-generic 4.10.0-14.16
ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
Uname: Linux 4.10.0-14-generic x86_64
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: sraffa 2780 F.... pulseaudio
 /dev/snd/controlC0: sraffa 2780 F.... pulseaudio
CurrentDesktop: Unity:Unity7
Date: Thu Mar 30 17:44:00 2017
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=aca70873-b3e7-46b8-a5df-18e8752a0640
InstallationDate: Installed on 2015-11-02 (514 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
MachineType: Dell Inc. Latitude E7250
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-14-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-14-generic N/A
 linux-backports-modules-4.10.0-14-generic N/A
 linux-firmware 1.164
SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-03-29 (0 days ago)
dmi.bios.date: 11/18/2015
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A09
dmi.board.name: 0V8RX3
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA09:bd11/18/2015:svnDellInc.:pnLatitudeE7250:pvr:rvnDellInc.:rn0V8RX3:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E7250
dmi.sys.vendor: Dell Inc.

Revision history for this message
bp (badpazzword) wrote :
Revision history for this message
bp (badpazzword) wrote :

Scrubbed PII from attachments.

Revision history for this message
bp (badpazzword) wrote :
Revision history for this message
bp (badpazzword) wrote :
Revision history for this message
bp (badpazzword) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: System soft-freezes, BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 in journalctl

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc5

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
bp (badpazzword) wrote :
Download full text (4.6 KiB)

Unfortunately I was still able to reproduce. Here's the current trace, this time without arbitrary line cutoffs.

While this report, too, has chrome as the "comm" I've also seen compiz being the "culprit."

IP: gen8_ppgtt_alloc_page_directories.isra.40+0x115/0x250 [i915]
PGD 0

Oops: 0002 [#1] SMP
Modules linked in: btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfcomm cmac bnep binfmt_misc cmdlinepart intel_spi_platform intel_spi spi_nor mtd dell_wmi sparse_keymap arc4 intel_rapl x86_pkg_temp_thermal dell_laptop intel_powerclamp coretemp iwlmvm dell_smm_hwmon kvm_intel kvm mac80211 irqbypass intel_cstate uvcvideo intel_rapl_perf videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 btrtl videobuf2_core btbcm btintel dell_led joydev videodev iwlwifi dell_smbios dcdbas media bluetooth
 snd_hda_codec_hdmi serio_raw snd_hda_codec_realtek snd_hda_codec_generic cfg80211 input_leds snd_soc_rt5640 snd_soc_rl6231 snd_hda_intel mei_me snd_soc_ssm4567 snd_hda_codec shpchp mei lpc_ich snd_hda_core snd_soc_core snd_hwdep snd_compress ac97_bus snd_pcm_dmaengine snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi int3403_thermal snd_seq snd_seq_device snd_timer elan_i2c snd acpi_als dw_dmac soundcore kfifo_buf dw_dmac_core industrialio snd_soc_sst_acpi i2c_designware_platform snd_soc_sst_match int3402_thermal 8250_dw i2c_designware_core spi_pxa2xx_platform mac_hid processor_thermal_device dell_rbtn int3400_thermal intel_soc_dts_iosf int340x_thermal_zone acpi_thermal_rel int3406_thermal acpi_pad parport_pc ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg dm_crypt
 hid_generic usbhid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 aesni_intel aes_x86_64 crypto_simd i2c_algo_bit glue_helper drm_kms_helper cryptd ahci psmouse syscopyarea libahci sysfillrect sysimgblt fb_sys_fops sdhci_pci e1000e drm ptp pps_core wmi video sdhci_acpi sdhci i2c_hid hid
CPU: 1 PID: 3403 Comm: chrome Not tainted 4.11.0-041100rc5-generic #201704022131
Hardware name: Dell Inc. Latitude E7250/0V8RX3, BIOS A09 11/18/2015
task: ffff969e1cf00000 task.stack: ffffbe03435b8000
RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.40+0x115/0x250 [i915]
RSP: 0018:ffffbe03435bb8c0 EFLAGS: 00010246
RAX: ffff969cf4346f80 RBX: 0000000000000003 RCX: 0000000000000003
RDX: 0000000000000000 RSI: ffff969e1b5f3000 RDI: ffff969e8ac80000
RBP: ffffbe03435bb918 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff969c89f3c000
R13: ffff969d2a61e090 R14: 00000000fffef000 R15: 0000000000008000
FS: 00007ff2cab90b00(0000) GS:ffff969e9e480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 000000019c1c6000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 gen8_alloc_va_r...

Read more...

tags: added: bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
removed: bug-exists-upstream
Revision history for this message
shankao (shankao) wrote :
Revision history for this message
bp (badpazzword) wrote :

Yes, the bug is indeed the same as that one.

dd19674bacba227ae5d3ce680cbc5668198894dc doesn't seem like it's made it to the main kernel repository yet.

bp (badpazzword)
summary: System soft-freezes, BUG: unable to handle kernel NULL pointer
- dereference at 0000000000000018 in journalctl
+ dereference at 0000000000000018 in journalctl at
+ gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250
Revision history for this message
Rocko (rockorequin) wrote :

How is this only medium priority? It causes data loss, which is rather catastrophic...

This might also be related to https://bugs.freedesktop.org/show_bug.cgi?id=100516, which unfortunately isn't going to be fixed until kernel 4.12.

Revision history for this message
s.illes79 (s-illes79-gmail) wrote :

This bug makes my laptop unusable, wish I haven't updated to 17.04 :(
any workaround?

Revision history for this message
bp (badpazzword) wrote :

My VERY shitty workaround has been to add the yakkety repositories back to my machine with apt edit-sources, then install and use the 4.8.x kernel from there. :/

Revision history for this message
s.illes79 (s-illes79-gmail) wrote :

Generally I'm having stability issue since
- upgraded to 17.04 - xorg freezes with kernel Ooops
- installed virtualbox - resume fails

I removed virtualbox and suspend seems to be ok
The bug opener also has vbox modules loaded, which makes me wonder if this all coz by virtualbox?

anyone else with this problem also have virtualbox installed?

Revision history for this message
s.illes79 (s-illes79-gmail) wrote :

just had a crash without virtualbox, so it's not that

4.8.x seems to be stable

Revision history for this message
Geoff McQueen (geoffmcqueen) wrote :

In my experience it was only after I added the Intel-specific drivers for my video card (in the Dell XPS 15 9560) that the crashing happened on 4.10.x. So, I'm just running 4.8.x until 4.12 comes out or someone magical advises here that the fix has gone into a minor release of 4.10 (which I read somewhere else wasn't going to happen because while the upstream folks have fixed the bug, the stream it is being merged into is 4.12)

Revision history for this message
Rocko (rockorequin) wrote :

Just a note that you can also find the 4.8 kernel (image and headers) at http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D, and if you install the deb files from there, you don't need to enable the yakkety repos.

Revision history for this message
TomaszChmielewski (mangoo-wpkg) wrote :

I just had this crash with 4.11.0-041100rc7 (from Ubuntu ppa). Updating to 4.11 final now...

Revision history for this message
s.illes79 (s-illes79-gmail) wrote :

upgrading to 4.11 won't help, fix is going into 4.12

downgrade to 4.8 ;)

Linux XPS13 4.8.0-46-generic #49-Ubuntu SMP Fri Mar 31 13:57:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

silles@XPS13:~$ uptime
 08:56:47 up 6 days, 10:06, 2 users, load average: 0.67, 1.14, 0.85

Revision history for this message
bp (badpazzword) wrote :

With that comment do you mean to say you're positive that 4.12 will have a fix, or do you mean to say you're positive that 4.11 will NOT have a fix?

Revision history for this message
s.illes79 (s-illes79-gmail) wrote :

all I know it's queued for 4.12: https://bugs.freedesktop.org/show_bug.cgi?id=100516#c2
but that's for mainline kernel, whether ubuntu backports it i donno

Revision history for this message
Nick Craig-Wood (nick-craig-wood) wrote :

I'm seeing this too.

Every time it has locked up I've been scrolling in chrome.

[28680.993697] CPU: 2 PID: 4930 Comm: chrome Tainted: G OE 4.10.0-20-generic #22-Ubuntu
[28680.993718] Hardware name: Dell Inc. Latitude E7450, BIOS A07 09/01/2015

As an experiment I've disabled hardware acceleration in chrome since all the traces show the drm module (Direct Rendering Module which is in charge of hardware acceleration).

No lockups yet, but it is early days!

I don't use compiz - I just use plain XFCE so no OpenGL needed for my desktop hopefully.

Revision history for this message
Matt (dstruct) wrote :

Hello,

I first want to thank the Ubuntu devs, I've been a very long time Ubuntu user and appreciate what you guys do to make and keep it such a great distro.

That being said, I too am affected by this bug on a fresh 17.04 install, I first noticed it after updating from 4.10.0-19 to 4.10.0-20, not to say -19 wasn't included but it's worth mentioning I never saw a crash when running -19.

I have since updated via PPA to 4.11.0-041100rc5-generic with no change.

Similar to the last comment on here by Nick about it crashing during scrolling in Chrome, I also have only seen crashes/freezes during Firefox scrolling. Music/sound will continue, video will play, however you lose all control over the system, caps lock doesn't respond, only a long-hold down of the power button seems to work.

If anyone is interested in any log dumps or information from my system please let me know.

Respectfully.

Revision history for this message
David Marín (davefx) wrote :

As a workaround, you can downgrade to Ubuntu 16.10 kernel following the steps at: https://askubuntu.com/questions/909508/how-do-downgrade-kernel-on-zesty-17-04-to-yakkety-16-10

Revision history for this message
Matthias (matthias-opennomad) wrote :

This seems to be a dupe of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1680904. While this report predates the other, it's the other that has more current info on a resolution.

Revision history for this message
bp (badpazzword) wrote :

bleh, whatever gets this bug fixed.

Revision history for this message
Nick Craig-Wood (nick-craig-wood) wrote :

> As an experiment I've disabled hardware acceleration in chrome since all the traces show the drm module (Direct Rendering Module which is in charge of hardware acceleration).

FWIW I've been running like this for two weeks and have seen no lockups.

Revision history for this message
Andrzej (jarzebowski-andrzej) wrote :
Download full text (6.6 KiB)

I am getting this error randomly every once in a while. It's locking my PC completely but it's not stopping Spotify playing. Machine can not be restarted in any way even trough SSH session (SSH still works). Her is last error

Jun 2 09:47:04 pc-name kernel: [109842.746855] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Jun 2 09:47:04 pc-name kernel: [109842.746917] IP: gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915]
Jun 2 09:47:04 pc-name kernel: [109842.746939] PGD 0
Jun 2 09:47:04 pc-name kernel: [109842.746940]
Jun 2 09:47:04 pc-name kernel: [109842.746953] Oops: 0002 [#1] SMP
Jun 2 09:47:04 pc-name kernel: [109842.746965] Modules linked in: arc4 ppp_mppe ppp_async nf_conntrack_pptp nf_conntrack_proto_gre ipheth bnep ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc aufs binfmt_misc snd_hda_codec_hdmi dell_wmi sparse_keymap dell_led dell_smbios snd_hda_codec_realtek snd_hda_codec_generic dcdbas joydev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel input_leds pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf snd_seq_midi snd_seq_midi_event snd_rawmidi serio_raw snd_seq
Jun 2 09:47:04 pc-name kernel: [109842.747160] snd_seq_device snd_timer snd soundcore mei_me shpchp mei intel_pch_thermal hci_uart btbcm btqca btintel bluetooth intel_lpss_acpi intel_lpss acpi_als kfifo_buf acpi_pad mac_hid industrialio parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid i915 i2c_algo_bit drm_kms_helper syscopyarea psmouse sysfillrect sysimgblt fb_sys_fops r8169 drm ahci mii libahci wmi video pinctrl_sunrisepoint pinctrl_intel i2c_hid hid fjes
Jun 2 09:47:04 pc-name kernel: [109842.747303] CPU: 1 PID: 4421 Comm: chrome Not tainted 4.10.0-21-generic #23-Ubuntu
Jun 2 09:47:04 pc-name kernel: [109842.747324] Hardware name: Dell Inc. OptiPlex 3040/05XGC8, BIOS 1.3.5 01/26/2016
Jun 2 09:47:04 pc-name kernel: [109842.747345] task: ffffa071225d4500 task.stack: ffffabc743844000
Jun 2 09:47:04 pc-name kernel: [109842.747377] RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915]
Jun 2 09:47:04 pc-name kernel: [109842.747399] RSP: 0018:ffffabc743847880 EFLAGS: 00010246
Jun 2 09:47:04 pc-name kernel: [109842.747414] RAX: ffffa07036de6840 RBX: 0000000000000003 RCX: 0000000000000003
Jun 2 09:47:04 pc-name kernel: [109842.747434] RDX: 0000000000000000 RSI: ffffa070cc571000 RDI: ffffa07204388000
Jun 2 09:47:04 pc-name kernel: [109842.747453] RBP: ffffabc7438478d8 R08: 0000000000000000 R09: 0000000000000000
Jun 2 09:47:04 pc-name kernel: [109842.747473] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa07058c28000
Jun 2 09:47:04 pc-name kernel: [109842.747493] R13: ffffa071901bb290 R14: 00000000ff790000 R15: 0000000000008000
Jun 2 09:47:04 pc-name kernel: [109842.747513] FS: 00007f64e6a8c480(0000) GS:ffffa07216c80000(0000) knlGS:0000000...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.