[Karmic] X crash due to xsetroot in startkde after recent update

Bug #526919 reported by Harald Sitter on 2010-02-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)

Bug Description

This report is based on a bug hunt I did on a system maintained by ~ericbouer.

Apparently the system was setup about a week ago and after some recent update, login was never successful and always resulted in an immediate bounce back to KDM.

After a lot of investigation it became apparent that
  xsetroot -cursor_name left_ptr
as part of the startkde script was causing a X crash. However, the command itself could not be used to reproduce the crash, it seems as if only the preceding commands in startkde make it a deadly one. Removing aforementioned line prevents the crash, re-adding it makes the crash again 100% reproducible.

Reinstalling x11-xserver-utils, which contains xsetroot, did also not improve the situation. Only removing xsetroot from startkde makes logins possible again.

Attached to this report you find:
* gdb-Xorg.txt - backtrace per https://wiki.ubuntu.com/X/Backtracing X started with -dumbSched, otherweise trace were unsuccessful
* .xsession-errors of two different users, in one the xprop commands in startkde were commented out (which turned out to not improve the situation)
* Xorg.0.log, Xorg.0.log.old, kdm.log, dpkg.log
* Output of dmesg and lspci -vvk

Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :
Harald Sitter (apachelogger) wrote :

I am also attaching startkde, just in case :)

Timo Aaltonen (tjaalton) wrote :

The first thing to do is to try to reproduce this on lucid..

Changed in xorg-server (Ubuntu):
status: New → Incomplete
Niels van Mourik (darkrazor) wrote :

The symptoms sound very - i said - very similar. I'm not sure what I did to get it all working but after a day I got my system up and running again, I think I've been removing/move lot's of files in $HOME starting with dots and recalling kbuildsycoca4 again until it worked.

Following the progress on this one, keep up the sublime work!

Bryce Harrington (bryce) wrote :

Thanks for collecting a backtrace. As Timo suggests, it would be helpful to verify this still occurs on Lucid. But I can give you some tips if you want to continue debugging it on karmic.

First, you mentioned this started happening only after a recent update. A straightforward step would be to go through /var/log/dpkg.log and one by one downgrade suspect packages that were updated until you find the update which led to the regression.

Second, in examining the startkde script (thanks for including it), one of the things that happens beforehand is fiddling with themes and cursor parameters. Since the crash occurs in ProcXFixesGetCursorImageAndName() this seems suspicious. Starting with the startkde script you posted, try commenting out lines in the file to simplify it down to a shorter script so it's clearer which sequence of commands lead to the crash.

A third approach would be to debug it bottom up in the source code. It seems this is crashing in the routine privateExists() in dix/private.c in the xorg-server package. The code in question is:

static _X_INLINE int
privateExists(PrivateRec **privates, const DevPrivateKey key)
    return *key && *privates &&
        (*privates)[0].state > *key &&

That's a lot of pointer dereferencing, and it's possible one of those pointers is either null or undefined (pointing to a random and inappropriate area of memory). This is a type of bug that's rather hard to diagnose based only on logs and backtraces, but should be suitable to debugging if you have access to the machine.

Run gdb and crash the server. Verify you have the same backtrace you got before. Now use gdb to examine the privates variable and its contents.

Hope this helps, and good luck.

Bryce Harrington (bryce) wrote :

[Resetting to incomplete since we need a response from the original reporter on this].

Changed in xorg-server (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Bryce Harrington (bryce) on 2010-03-09
tags: added: kubuntu
Harald Sitter (apachelogger) wrote :

So we were taking another look as per the suggestions made in comment #13

1) dpkg.log is attached and very massive, so it is impossible to track the package that caused this problem IMHO

2) By commenting out various parts at a time it seems like the following code parts additionally work around the crash

=== line 14 ====
trap 'echo GOT SIGHUP' HUP

=== line 111-129 ===
mkdir -m 700 -p $kdehome
mkdir -m 700 -p $kdehome/share
mkdir -m 700 -p $kdehome/share/config
cat >$kdehome/share/config/startupconfigkeys <<EOF
kcminputrc Mouse cursorTheme 'Oxygen_Black'
kcminputrc Mouse cursorSize ''
ksplashrc KSplash Theme Default
ksplashrc KSplash Engine KSplashX
kcmfonts General forceFontDPI 0
kdeglobals Locale Language '' # trigger requesting languages from KLocale
kdeglobals Locale Country ''
if test $returncode -ne 0; then
    xmessage -geometry 500x100 "kstartupconfig4 does not exist or fails. The error code is $returncode. Check your installation."
    exit 1
[ -r $kdehome/share/config/startupconfig ] && . $kdehome/share/config/startupconfig

=== line 161-185 ===
unset DESKTOP_LOCKED # Don't want it in the environment

if test -z "$dl"; then
  # languages as resolved by KLocale, for the splash screens use
  # klocale_languages is assembled by kdostartupconfig4 calling KLocale
  # the splashscreen and progress indicator
  case "$ksplashrc_ksplash_engine" in
      ksplash_pid=`ksplashx "${ksplashrc_ksplash_theme}" --pid`
      ksplash_pid=`ksplashsimple "${ksplashrc_ksplash_theme}" --pid`
  # no longer needed in the environment

=== line 288 (possibly preceding associated code too) ===
xset fp rehash

3) This did unfortunately not work out since gdb decided to claim that privates and key are both 0x0 (in about any way one could try to display/print them), debugging the problem using print statements in the code is unfortunately not an option since the machine is used in "production" and most importantly I do not have direct access to it. So, unless the affectee wants to digg into adding prints and recompiling X over and over again to try tracking the real problem this is a dead end.

Launchpad Janitor (janitor) wrote :

[Expired for xorg-server (Ubuntu) because there has been no activity for 60 days.]

Changed in xorg-server (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers