xmir doesn't recover after GPU hang

Bug #1202397 reported by Chris Gagnon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)
Won't Fix
High
Chris Halse Rogers

Bug Description

Steps
1. install saucy
2. pin ppa with

/etc/apt/preferences.d/50-pin-mir.pref
Package: *
Pin: origin "private-ppa.launchpad.net"
Pin-Priority: 1001

Package: *
Pin: release o=LP-PPA-mir-team-system-compositor-testing
Pin-Priority: 1002

3. sudo apt-get install phoronix-test-suite php5-json php5-cli libsdl1.2-dev libsdl-gfx1.2-dev libsdl-net1.2-dev libsdl-image1.2-dev libsdl-ttf2.0-dev libsdl-mixer1.2-dev lightdm libxatracker1 mir-demos libmirserver0 unity-system-compositor xserver-common xserver-xorg-core -y || sudo apt-get install -f
4 sudo apt-get distupgrade -y --force-yes
5. sudo reboot
6. run gui-toolkits benchmark
    phoronix-test-suite benchmark pts/gui-toolkits

expected result:
xmir is running after benchmark

actual result:
x is in failsafe mode

xorg.0.log has this in it

[ 4562.242] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption: Permission denied.
 (EE)
[ 4562.291] (EE) Backtrace:
[ 4562.291] (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x7f4797d927cd]
[ 4562.291] (EE) 1: /usr/bin/X (0x7f4797bf1000+0x1a5529) [0x7f4797d96529]
[ 4562.291] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f4796cf1000+0xfbd0) [0x7f4796d00bd0]
[ 4562.291] (EE) 3: /usr/lib/x86_64-linux-gnu/libpixman-1.so.0 (0x7f479684e000+0x81c70) [0x7f47968cfc70]
[ 4562.291] (EE) 4: /usr/lib/x86_64-linux-gnu/libpixman-1.so.0 (0x7f479684e000+0x5088b) [0x7f479689e88b]
[ 4562.291] (EE) 5: /usr/lib/x86_64-linux-gnu/libpixman-1.so.0 (pixman_blt+0x52) [0x7f47968591b2]
[ 4562.292] (EE) 6: /usr/lib/xorg/modules/libfb.so (fbCopyNtoN+0x33b) [0x7f479251603b]
[ 4562.292] (EE) 7: /usr/bin/X (miCopyRegion+0x1ad) [0x7f4797d72b0d]
[ 4562.292] (EE) 8: /usr/bin/X (miDoCopy+0x456) [0x7f4797d73096]
[ 4562.292] (EE) 9: /usr/lib/xorg/modules/libfb.so (fbCopyArea+0x46) [0x7f4792516876]
[ 4562.292] (EE) 10: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f4792947000+0x1014b0) [0x7f4792a484b0]
[ 4562.292] (EE) 11: /usr/bin/X (0x7f4797bf1000+0xe15d8) [0x7f4797cd25d8]
[ 4562.292] (EE) 12: /usr/bin/X (0x7f4797bf1000+0xe23f5) [0x7f4797cd33f5]
[ 4562.292] (EE) 13: /usr/bin/X (0x7f4797bf1000+0xe0d1f) [0x7f4797cd1d1f]
[ 4562.292] (EE) 14: /usr/bin/X (0x7f4797bf1000+0x7ccf6) [0x7f4797c6dcf6]
[ 4562.292] (EE) 15: /usr/bin/X (MapWindow+0x105) [0x7f4797c70ad5]
[ 4562.293] (EE) 16: /usr/bin/X (ReparentWindow+0x2b4) [0x7f4797c73f24]
[ 4562.293] (EE) 17: /usr/bin/X (HandleSaveSet+0x89) [0x7f4797c73fc9]
[ 4562.293] (EE) 18: /usr/bin/X (FreeClientResources+0x1b) [0x7f4797c6887b]
[ 4562.293] (EE) 19: /usr/bin/X (CloseDownClient+0x57) [0x7f4797c45277]
[ 4562.293] (EE) 20: /usr/bin/X (0x7f4797bf1000+0x54de6) [0x7f4797c45de6]
[ 4562.293] (EE) 21: /usr/bin/X (0x7f4797bf1000+0x443ea) [0x7f4797c353ea]
[ 4562.293] (EE) 22: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7f479593dea5]
[ 4562.293] (EE) 23: /usr/bin/X (0x7f4797bf1000+0x44731) [0x7f4797c35731]
[ 4562.293] (EE)
[ 4562.293] (EE) Segmentation fault at address 0x26804
[ 4562.293]
Fatal server error:
[ 4562.293] Caught signal 11 (Segmentation fault). Server aborting
[ 4562.293]
[ 4562.293] (EE)

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote : Re: x is in failsafe mode after pinning system-compositor-testing ppa installing xmir with distupgrade and rebooting
Download full text (336.6 KiB)

output of dpkg -l

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-======================================================-============-===============================================================================
ii account-plugin-aim 3.8.3-0ubuntu3 amd64 Messaging account plugin for AIM
ii account-plugin-facebook 0.11+13.10.20130711-0ubuntu1 all GNOME Control Center account plugin for single signon - facebook
ii account-plugin-flickr 0.11+13.10.20130711-0ubuntu1 all GNOME Control Center account plugin for single signon - flickr
ii account-plugin-google 0.11+13.10.20130711-0ubuntu1 all GNOME Control Center account plugin for single signon
ii account-plugin-jabber 3.8.3-0ubuntu3 amd64 Messaging account plugin for Jabber/XMPP
ii account-plugin-salut 3.8.3-0ubuntu3 amd64 Messaging account plugin for Local XMPP (Salut)
ii account-plugin-twitter 0.11+13.10.20130711-0ubuntu1 all GNOME Control Center account plugin for single signon - twitter
ii account-plugin-windows-live 0.11+13.10.20130711-0ubuntu1 all GNOME Control Center account plugin for single signon - windows live
ii account-plugin-yahoo 3.8.3-0ubuntu3 amd64 Messaging account plugin for Yahoo!
ii accountsservice 0.6.34-0ubuntu2 amd64 query and manipulate user account information
ii acl 2.2.52-1 amd64 Access control list utilities
ii acpi-support 0.142 amd64 scripts for handling many ACPI events
ii acpid 1:2.0.18-1ubuntu1 amd64 Advanced Configuration and Power Interface event daemon
ii activity-log-manager 0.9.7-0ubuntu3 amd64 blacklist configuration user interface for Zeitgeist
ii activity-log-manager-control-center 0.9.7-0ubuntu3 all blacklist configuration for Zeitgeist (transitional package)
ii adduser 3.113+nmu3ubuntu2 all add and remove users and groups
ii adium-theme-ubuntu 0.3.3-0ubuntu1 ...

description: updated
summary: - update-notifier-crash (/var/crash/_usr_sbin_unity-system-
- compositor.0.crash) main process (3196) terminated with status 1
+ x is in failsafe mode after pinning system-compositor-testing ppa
+ installing xmir with distupgrade and rebooting
Revision history for this message
Robert Ancell (robert-ancell) wrote :

Both unity-system-compositor and X have crashed, which indicates to me that Mir has failed for some reason. In /var/log/unity-system-compositor.log we have:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
  what(): Output has no associated crtc

Which is Mir failing to get an output to render to.

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Since the hardware is the same and the PPA hasn't changed a potential cause is the kernel drivers have changed in some way.

Changed in xmir:
assignee: nobody → Chris Halse Rogers (raof)
importance: Undecided → High
Revision history for this message
Robert Ancell (robert-ancell) wrote :

Assigning to Chris to work out if there's been a driver mismatch.

Revision history for this message
Chris Halse Rogers (raof) wrote :

That backtrace is a *long* time after boot - it seems that u-s-c and XMir were running happily for 80 minutes or so before the crash? Does that match up with your observations?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Incomplete. Question to be answered ^^^

Changed in xmir:
status: New → Incomplete
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (4.7 KiB)

it looks like xmir didn't recover or crashed from a GPU hang

Jul 17 15:50:32 201208-11587 kernel: [ 413.356023] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 15:50:32 201208-11587 kernel: [ 413.356034] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
Jul 17 15:58:21 201208-11587 kernel: [ 881.817296] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 15:58:32 201208-11587 kernel: [ 892.805222] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:06:18 201208-11587 kernel: [ 1358.317729] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:07:03 201208-11587 kernel: [ 1403.268344] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:09:01 201208-11587 CRON[2464]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Jul 17 16:09:03 201208-11587 kernel: [ 1523.136638] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:11:39 201208-11587 kernel: [ 1678.953429] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:17:01 201208-11587 CRON[2613]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 17 16:19:00 201208-11587 kernel: [ 2119.481397] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:19:53 201208-11587 kernel: [ 2172.399251] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:23:33 201208-11587 kernel: [ 2392.181761] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:25:09 201208-11587 kernel: [ 2488.076395] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:27:03 201208-11587 kernel: [ 2601.927301] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:29:14 201208-11587 kernel: [ 2732.807495] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:31:19 201208-11587 dhclient: DHCPREQUEST of 10.193.37.199 on eth0 to 10.193.37.1 port 67 (xid=0x365bce4b)
Jul 17 16:31:19 201208-11587 dhclient: DHCPACK of 10.193.37.199 from 10.193.37.1
Jul 17 16:31:19 201208-11587 dhclient: bound to 10.193.37.199 -- renewal in 2826 seconds.
Jul 17 16:31:19 201208-11587 kernel: [ 2857.293397] audit_printk_skb: 120 callbacks suppressed
Jul 17 16:31:19 201208-11587 kernel: [ 2857.293424] type=1400 audit(1374093079.342:67): apparmor="DENIED" operation="file_mmap" parent=847 profile="/sbin/dhclient" name="/bin/bash" pid=2724 comm="dhclient-script" requested_mask="m" denied_mask="m" fsuid=0 ouid=0
Jul 17 16:39:01 201208-11587 CRON[2915]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Jul 17 16:45:49 201208-11587 kernel: [ 3726.715425] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 17 16:49:35 201208-11587 kernel: [ 3952.455386] [drm:i915_hangcheck_hung] *ERROR* H...

Read more...

Changed in xmir:
status: Incomplete → Confirmed
summary: - x is in failsafe mode after pinning system-compositor-testing ppa
- installing xmir with distupgrade and rebooting
+ xmir doesn't recover after GPU hang
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If you're repeatedly stuck in failsafe mode, this should fix it:
    rm ~/.Xauthority

And logout/reboot. Try again.

tags: added: qa-touch
description: updated
affects: xmir → xorg-server (Ubuntu)
tags: added: xmir
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

XMir 1.0 (the old Xorg extension) is now deprecated and is not being maintained or fixed. It is replaced by the new 'Xmir' binary (package 'xmir') introduced in Ubuntu 15.10 wily.

Changed in xorg-server (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.