Ubuntu

console-kit-daemon crashed with SIGSEGV in g_str_hash()

Reported by Felipe Besoain on 2008-09-13
This bug affects 34 people
Affects Status Importance Assigned to Milestone
ConsoleKit
Fix Released
Medium
consolekit (Ubuntu)
High
Martin Pitt
Hardy
Undecided
Unassigned
Intrepid
High
Martin Pitt

Bug Description

Binary package hint: consolekit

i'm playing music with rhythymbox.. and the console- kit is falling down

ProblemType: Crash
Architecture: i386
CrashCounter: 1
DistroRelease: Ubuntu 8.10
ExecutablePath: /usr/sbin/console-kit-daemon
NonfreeKernelModules: ath_hal
Package: consolekit 0.2.10-1ubuntu4
ProcAttrCurrent: unconfined
ProcCmdline: /usr/sbin/console-kit-daemon
ProcEnviron:

Signal: 11
SourcePackage: consolekit
StacktraceTop:
 g_str_hash () from /usr/lib/libglib-2.0.so.0
 ?? () from /usr/lib/libglib-2.0.so.0
 ?? ()
 ?? ()
 ?? ()
Title: console-kit-daemon crashed with SIGSEGV in g_str_hash()
Uname: Linux 2.6.27-2-generic i686
UserGroups:

Related branches

Felipe Besoain (fbesoain) wrote :

StacktraceTop:g_hash_table_remove_internal (hash_table=0x9237db8, key=0x0, notify=1)
file_monitor_remove_watch (monitor=0x9240d60, watch=0x0) at ck-file-monitor-inotify.c:245
ck_file_monitor_remove_notify (monitor=0x9240d60, id=2) at ck-file-monitor-inotify.c:529
ck_tty_idle_monitor_stop (monitor=0x9243980) at ck-tty-idle-monitor.c:211
ck_session_finalize (object=0x92490e0) at ck-session.c:855

Changed in consolekit:
importance: Undecided → Medium
James Westby (james-w) wrote :

bug 244218 has a debug log that may be interesting.

Thanks,

James

Changed in consolekit:
status: New → Confirmed

Hi,

I have the same issue under Intrepid.
This have been happened 2 times in 2 days, while I'm listening music (mpd daemon + sonata client).

In my syslog :
 kernel: [46676.090474] console-kit-dae[5365]: segfault at 0 ip b7fa4d37 sp bf9cc6b4 error 4 in libglib-2.0.so.0.1800.1[b7f48000+b5000]

And in the attached file, the generated file in /var/crash/

Thanks

tdflanders (thomasdelbeke) wrote :

Binary package hint: consolekit

Hi James,

as I predicted this one came back.

thomas@thomas-laptop:~$ lsb_release -rd ; uname -a ; apt-cache policy linux linux-source-2.6.27 linux-headers-generic linux-image-genericDescription: Ubuntu intrepid (development branch)
Release: 8.10
Linux thomas-laptop 2.6.27-4-generic #1 SMP Wed Sep 24 01:30:51 UTC 2008 i686 GNU/Linux
linux:
  Installed: 2.6.27.4.4
  Candidate: 2.6.27.4.4
  Version table:
 *** 2.6.27.4.4 0
        500 http://gb.archive.ubuntu.com intrepid/restricted Packages
        100 /var/lib/dpkg/status
linux-source-2.6.27:
  Installed: 2.6.27-4.6
  Candidate: 2.6.27-4.6
  Version table:
 *** 2.6.27-4.6 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
linux-headers-generic:
  Installed: 2.6.27.4.4
  Candidate: 2.6.27.4.4
  Version table:
 *** 2.6.27.4.4 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
linux-image-generic:
  Installed: 2.6.27.4.4
  Candidate: 2.6.27.4.4
  Version table:
 *** 2.6.27.4.4 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
thomas@thomas-laptop:~$

tdflanders (thomasdelbeke) wrote :

Bug #276092:
This report is public
console-kit-daemon crashed with SIGSEGV in g_hash_table_remove_internal()

I meant James Weber,

I got it now with trying to upgrade to 2.6.27-rc7, without success.

I will also place an upgrade failure here, that referred to consolekit.

Please take a look at it at you earliest convenience. It is confirmed now that my hardware is working properly, since it was actually the Acer BIOS not allowing for 4 GB of RAM. Some issues remained (like the aptonCD thing) and I did not get a reply yet to my latest launchpad message. I am sorry if I delayed you guys during the alfa's, but I really suggest you have a look at this during beta testing.

Cheers,

Thomas

tdflanders (thomasdelbeke) wrote :
Download full text (5.6 KiB)

Your mail to 'launchpad-users' with the subject

    feedback

Is being held until the list moderator can review it for approval.

The reason it is being held:

    Post to moderated list

Either the message will get posted to the list, or you will receive
notification of the moderator's decision. If you would like to cancel
this posting, please visit the following URL:

    https://lists.ubuntu.com/mailman/confirm/launchpad-users/f104883df70e01b653d6db90a22d81685b686ecd

Bad confirmation string
Invalid confirmation string: f104883df70e01b653d6db90a22d81685b686ecd.

Note that confirmation strings expire approximately 3 days after the initial subscription request. If your confirmation has expired, please try to re-submit your subscription. Otherwise, re-enter your confirmation string.
launchpad-users list run by kiko at async.com.br, matthew.revell at canonical.com, joey at canonical.com
launchpad-users administrative interface (requires authorization)
Overview of all lists.canonical.com mailing lists

Delivered by Mailman
version 2.1.8 Python Powered Gnu's Not Unix

Hi there,

I am going to stop reporting bug reports for now, as you people will not read them anyway.

I have confirmed with the help of canonical support that the laptop I am using has no defective hardware. The BIOS is not broke, since I have reflashed the BIOS with an upgraded version, from the Acer website. For the few bugs I could easily reproduce this does not change a thing. What we have found out is that the BIOS developed by the Phoenix people for Acer, does not support 4 GB. Acer claims that the system supports 4 GB, but the people from Phoenix must have limited the BIOS to 3 GB, since it was made for use with Windows 32-bit. The system has two identical sodim sockets and I have two identical 2 GB chips, that were sold to me fully installed by an official eBay reseller. I have tested both sockets and both chips by memtest86+ 1.65 1.70 and 2.01. Everything came up clean except the alfa 5 2.01 test, which had a bug, but is now fixed. I had a problem before when trying to upgrade to Vista Ultimate, as the 32-bit only recognises 3 GB. Back then a faulty memtest86+ test surfaced. I have the memtest86+ version in backup, but it is preinstalled and backed up with Windows Live OneCare and I have no idea how to determine the version number. Really it does not change anything, as when I remove one of the chips, the same bugs seem to persist. That is, the ones that are easily reproducible. I now get less crashes both due to the fact that I now only run Ubuntu repositories, other than VirtualBox 2.0.2 and Skype-debian as described in the community docs, as well as alfa 6 and maybe also because I only use 2 GB and the BIOS update. I do use many multiverse and restricted applications though.

I maybe still able to help with the following data:

My AptonCD bug persists. Only faster since I now use only 2 GB RAM. I have tracked and described this behaviour with gnome-system-monitor and Valgrind log several weeks ago, but Tormod would not look into it since he suspected hardware failure. The problem is there that APTonCD writes everything away to the RAM chip(s) and crashes when they ...

Read more...

Joachim R. (jro) wrote :

It happens on ia64 too (not only i386 package).

Diego Collaziol (dcollaziol) wrote :

Here with Intrepid NVIDIA package is crashing everything, kontact, console, xserver....and this consolekit too...

Frederick F. Kautz IV (fkautz) wrote :

I got this error as well with the following configuration profile:

The system was installed using wubi from http://wubi-installer.org/devel/minefield/

Wubi-8.10-rev510.exe 03-Oct-2008 08:35 959K

$ lsb_release -rd ; uname -a; apt-cache policy linux linux-image-generic consolekit

Description: Ubuntu intrepid (development branch)
Release: 8.10
Linux ubuntu 2.6.27-5-generic #1 SMP Fri Oct 3 00:38:23 UTC 2008 i686 GNU/Linux
linux:
  Installed: (none)
  Candidate: 2.6.27.5.5
  Version table:
     2.6.27.5.5 0
        500 http://us.archive.ubuntu.com intrepid/restricted Packages
linux-image-generic:
  Installed: 2.6.27.5.5
  Candidate: 2.6.27.5.5
  Version table:
 *** 2.6.27.5.5 0
        500 http://us.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
consolekit:
  Installed: 0.2.10-1ubuntu7
  Candidate: 0.2.10-1ubuntu7
  Version table:
 *** 0.2.10-1ubuntu7 0
        500 http://us.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status

tdflanders (thomasdelbeke) wrote :

Hi there,

this is the terminal output from trying to rescue my system after yet another fatal system crash. This actually happened several weeks ago. I had asked on the launchpad digest mail list what to do with this, but I did not receive a reply yet. Probably this was because of the works on the launchpad system. The crash happened while upgrading from Hardy to Intrepid (then alfa 4 or 5) with '$ sudo update-manager -c -d'. I post this here and on bug # 254238, before opening another bug report. I copied this meticulously on the envelop from my bank statements. I just retrieved it while cleaning the room.

Cheers,

Thomas

<Terminal Output>

thomas@thomas:~$ sudo dpkg-reconfigure hal
Failed to open connection to system message bus: Failed to connect to socket /var/run/dbus/system_bus_socket:
Connection refused
invoke-rc.d: initscript dbus, action "force-reload" failed.
polkit-auth: This operation requires the system message bus and Consolekit to be running
/usr/lib/policykit/polkit-read-auth-helper: symbol
lookup error: /usr/lib/policykit/polkit-read-auth-helper:
undefined symbol: kit-getpwnam
polkit-auth: Not Authorized To Read Authorizations For Other Users:
uid 0 is not authorized to read authorizations for uid 111
(requires org.freedesktop.policykit.read)
thomas@thomas-laptop:~$
update-rc.d: warning: multiuser is deprecated ; specify runlevels manually

Alessandro Isaia (alex69) wrote :

Description: Ubuntu intrepid (development branch)
Release: 8.10
the packagename crashed is : < ConsoleKit > and the used version is the latest: < 0.2.10-1ubuntu7 >

Hi!

We get a lot of bug reports about crashes with this signature:

g_hash_table_remove_internal (hash_table=0x9237db8, key=0x0, notify=1)
file_monitor_remove_watch (monitor=0x9240d60, watch=0x0) at ck-file-monitor-inotify.c:245
ck_file_monitor_remove_notify (monitor=0x9240d60, id=2) at ck-file-monitor-inotify.c:529
ck_tty_idle_monitor_stop (monitor=0x9243980) at ck-tty-idle-monitor.c:211
ck_session_finalize (object=0x92490e0) at ck-session.c:855

http://launchpadlibrarian.net/17571430/Stacktrace.txt has a complete and fully symbolic stack trace. It seems that in some cases, file_monitor_remove_watch() is called with watch == NULL, which leads to this crash.

The dodgy approach would be to just test for this condition in file_monitor_remove_watch(), but I guess watch == NULL is a "this should not happen(TM)" case, and there is a deeper logic error?

Changed in consolekit:
status: Unknown → Confirmed

Created an attachment (id=19645)
Ubuntu patch

For now I applied a patch which intercepts removing a NULL watch. Even if that condition "should not happen", it is defensive, and after removing a watch the state is consistent again anyway.

tdflanders (thomasdelbeke) wrote :

Hi James,

I am reassigning this to you since this is frequently reoccuring.

Cheers,

Thomas

Changed in consolekit:
assignee: nobody → james-w
James Westby (james-w) wrote :

Hi,

Please don't just assign bug reports to people.

I've looked at the bug many times, and it is beyond me to come up with a fix.

We are well aware of the bug and it's impact and we are trying to get it fixed.

Thanks,

James

Changed in consolekit:
assignee: james-w → nobody

Hi there James,

sorry about that. I should have commented furter I guess. I am marking bugs for closing down since beta is much more stable to me indeed. I am down to 67 from a top in the high 80`s. I must have posted well over 200, and most that are open now are weeks old. I am trying to group them and assign them to one person. I have many console-kits outstanding and I thought you were the man. The alternative would be I post the groups on launchpast request digest, Can you give me feedback on this quickly please?

Thanks,

Thomas

----- Original Message ----
From: James Westby <email address hidden>
To: <email address hidden>
Sent: Tuesday, October 14, 2008 1:55:39 AM
Subject: [Bug 269651] Re: console-kit-daemon crashed with SIGSEGV in g_str_hash()

Hi,

Please don't just assign bug reports to people.

I've looked at the bug many times, and it is beyond me to come up with a
fix.

We are well aware of the bug and it's impact and we are trying to get it
fixed.

Thanks,

James

** Changed in: consolekit (Ubuntu)
     Assignee: James Westby (james-w) => (unassigned)

--
console-kit-daemon crashed with SIGSEGV in g_str_hash()
https://bugs.launchpad.net/bugs/269651
You received this bug notification because you are a direct subscriber
of a duplicate bug.

tdflanders (thomasdelbeke) wrote :

Hi there James,

correction: I had 6 consolekit bugs outstanding yesterday, but I marked them all as public instead of private. This afternoon they were marked as duplicate of this one.

thomas@thomas-laptop:~$ lsb_release -rd ; uname -a ; apt-cache policy linux linux-image linux-source-2.6.27 linux-headers-generic linux-restricted-modules consolekit ; hwinfo +all log=hw_log
Description: Ubuntu intrepid (development branch)
Release: 8.10
Linux thomas-laptop 2.6.27-7-generic #1 SMP Fri Oct 10 03:55:24 UTC 2008 i686 GNU/Linux
linux:
  Installed: 2.6.27.7.8
  Candidate: 2.6.27.7.8
  Version table:
 *** 2.6.27.7.8 0
        500 http://gb.archive.ubuntu.com intrepid/restricted Packages
        100 /var/lib/dpkg/status
linux-image:
  Installed: 2.6.27.7.8
  Candidate: 2.6.27.7.8
  Version table:
 *** 2.6.27.7.8 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
linux-source-2.6.27:
  Installed: 2.6.27-7.10
  Candidate: 2.6.27-7.10
  Version table:
 *** 2.6.27-7.10 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
linux-headers-generic:
  Installed: 2.6.27.7.8
  Candidate: 2.6.27.7.8
  Version table:
 *** 2.6.27.7.8 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
linux-restricted-modules:
  Installed: 2.6.27.7.8
  Candidate: 2.6.27.7.8
  Version table:
 *** 2.6.27.7.8 0
        500 http://gb.archive.ubuntu.com intrepid/restricted Packages
        100 /var/lib/dpkg/status
consolekit:
  Installed: 0.2.10-1ubuntu7
  Candidate: 0.2.10-1ubuntu7
  Version table:
 *** 0.2.10-1ubuntu7 0
        500 http://gb.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
thomas@thomas-laptop:~$

tdflanders (thomasdelbeke) wrote :

This bug is easily reproducible, but not with the same command. It is also fatal and most of the time requires reinstall of the system. Therefore Valgrind is not easy to use. I have one more consolekit crash report in /var/crash, but I guess you will not be able to tell me how to do a backtrace. There are two comments that could reveal the area of trouble of the bug, maybe even the trigger:

1) something similar to Bug #244218:

This bug was marked by you as having a useful log. I was able to reproduce something very similar in Bug #277208. It is a landscape-client bug. I got it while trying to reproduce the landscape-sysinfo bug (Bug #270007) for Andreas. Since it was a bug invoked by a permission error I tried something like 'sudo chown -cR 1000 landscape-client ; sudo chmod -cR 777 landscape-client'. It crashed my system and I had to reboot. I also tried 'sudo chown -cR 1000 /* | sudo chmod -cR 777 /*'. This predictably crashed my system previous to planned reinstalling, to validate the aptoncd issue. I will repeat that comment under Valgrind tonight, or more likely tomorrow. Please let me know if you can think of a better way (e.g. not requiring reinstall and always reproducible) to trigger this bug under a Valgrind session.

tdflanders (thomasdelbeke) wrote :

Please also take a look at the terminal output I got:

thomas@thomas:~$ sudo dpkg-reconfigure hal
Failed to open connection to system message bus: Failed to connect to socket /var/run/dbus/system_bus_socket:
Connection refused
invoke-rc.d: initscript dbus, action "force-reload" failed.
polkit-auth: This operation requires the system message bus and Consolekit to be running
/usr/lib/policykit/polkit-read-auth-helper: symbol
lookup error: /usr/lib/policykit/polkit-read-auth-helper:
undefined symbol: kit-getpwnam
polkit-auth: Not Authorized To Read Authorizations For Other Users:
uid 0 is not authorized to read authorizations for uid 111
(requires org.freedesktop.policykit.read)
thomas@thomas-laptop:~$
update-rc.d: warning: multiuser is deprecated ; specify runlevels manually

This is the root terminal output of dpkg-reconfigure hal. It is a long shot but I am drawing your attention to this as it is also a dbus error. This means it is suitable for a Valgrind backtrace and is possibly memory related, am I correct? I had a lot of memory errors and another dbus error I reported here: Bug #270330. It came back a lot after a fatal crash during the alfas. Possibly I can log this through Valgrind with my alfa 3 cd.

Martin Pitt (pitti) on 2008-10-14
Changed in consolekit:
assignee: nobody → pitti
status: Confirmed → Fix Committed

Hi,

warum zum T.... erhalte ich seit Tagen jede Menge mails über diesen und
ähnliche Bugs??????
[Bug 269651, 282461, 282579, 282660 ... ]
ICH hatte lediglich genickt, als gefragt wurde, ob ich die Meldung über
den gerade erfolgten Crash an das "Fehlerbehebungsteam" senden wolle...
(btw. ich schätze diese Art der Fehlerbereinigung und die
dahinterstehende Arbeit!!)

mit freundlichem Gruß
Manfred Hensel

tdflanders schrieb:
> Hi James,
>
> I am reassigning this to you since this is frequently reoccuring.
>
> Cheers,
>
> Thomas
>
> ** Changed in: consolekit (Ubuntu)
> Assignee: (unassigned) => James Westby (james-w)
>
>

Martin Pitt (pitti) wrote :

(for our English readers, he wonders why he gets so much mail about
this, I answer in German)

Hallo Manfred,

Manfred Hensel [2008-10-14 9:54 -0000]:
> warum zum T.... erhalte ich seit Tagen jede Menge mails über diesen und
> ähnliche Bugs??

Dieser Bug hat sehr viele "Duplikate", d. h. Fehler-Reports ueber
dasselbe Problem. Diese werden alle als Duplikate markiert, und damit
alle beteiligten Personen ueber den Fortschritt informiert werden,
werden alle ueber den Fortgang und den Bugfix informiert. Bei solch
extremen Bugs mit sehr vielen Duplikaten erzeugt das leider auch viele
Mails :-(

Das Problem ist ja nun behoben, deshalb duerfen in Zukunft keine (oder
kaum noch welche) kommen.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package consolekit - 0.2.10-1ubuntu8

---------------
consolekit (0.2.10-1ubuntu8) intrepid; urgency=low

  * Add debian/patches/10-file_monitor_remove_watch_crash.patch: Fix common
    crash when trying to remove a NULL watch. (LP: #269651)

 -- Martin Pitt <email address hidden> Tue, 14 Oct 2008 08:02:40 +0200

Changed in consolekit:
status: Fix Committed → Fix Released
iponeverything (cookema) wrote :

I have to say that I excited to see that this bug was fixed.. But alias -- I am still getting crashes after my update to the latest console-kit-daemon.

Before the fix:
messages.0:Oct 11 08:50:45 foo-laptop kernel: [ 231.404242] console-kit-dae[4784]: segfault at 0 ip b7e01d37 sp bfe29754 error 4 in libglib-2.0.so.0.1800.1[b7da5000+b5000]

After the fix:
syslog:Oct 16 08:01:49 foo-laptop kernel: [ 485.639885] console-kit-dae[4843]: segfault at 0 ip b7f1fd37 sp bf9491f4 error 4 in libglib-2.0.so.0.1800.1[b7ec3000+b5000]
syslog:Oct 16 09:32:51 foo-laptop kernel: [ 3879.528811] console-kit-dae[4834]: segfault at 0 ip b7e32d37 sp bfa5bbc4 error 4 in libglib-2.0.so.0.1800.1[b7dd6000+b5000]

Martin Pitt (pitti) wrote :

Hi iponeverything,

iponeverything [2008-10-16 5:29 -0000]:
> I have to say that I excited to see that this bug was fixed.. But alias
> -- I am still getting crashes after my update to the latest console-kit-
> daemon.

:( Can you please report them through the standard crash handler? You
might have discovered yet another bug.

geert (blaat-euronet) wrote :

Will this fix backported to hardy?

Martin Pitt (pitti) wrote :

I just checked, it does apply to hardy.

Changed in consolekit:
assignee: nobody → pitti
status: New → In Progress
iponeverything (cookema) wrote :

Martin -- Will do: Bug #284218

thanks.

Martin Pitt (pitti) wrote :

OK, thanks; seems it didn't quite fix it yet then, reopening.

Changed in consolekit:
status: Fix Released → In Progress
Martin Pitt (pitti) wrote :

iponeverything, do you actually have a recipe how to reproduce this? It seems to happen quite often for you. I never ever got a crash :/ (which is good for me as a user, but bad as a developer...)

Changed in consolekit:
assignee: pitti → nobody
status: In Progress → New
iponeverything (cookema) wrote :

Its been pretty random. It has happened twice immediately after resume from suspend a few times immediately after starting the desktop. I will keep my eyes open for a way to reproduce it.

tdflanders (thomasdelbeke) wrote :

Hi Martin,

I had this bug already at least 12 - 15 times. It is always fatal, but you cannot reproduce it at will. The best way to try and get it is to deliberately mess up your system, requiring reboot. The problem is that I cannot log this.

You could perhaps try it in VirtualBox or VMWare:

$ sudo chmod -cR 077 consolekit ; sudo chown -cR 1000 consolekit ; sudo <whatever>

OR

$ sudo chmod -cR 777 /* ; sudo reboot

This will render your system inoperable: If you now try to access your old partition through the LiveCD, you may experience the crash. Make sure you have enough linux-swap available (at least 5 GB) and update and upgrade completely, use all necessary debyg-symbols, hopefully you could do this. I will go for a fresh reinstall soon. Please give me instructions if you want me to try this out.

Thanks,

Thomas

iponeverything (cookema) wrote :

This happens to me often enough that I will just an strace attached to it till it bombs again.

gnuckx (gnuckx) wrote :

console-kit-daemon crashed with SIGSEGV in g_str_hash()

Pete Graner (pgraner) wrote :

Happeneds to me every login.

iponeverything (cookema) wrote :

Ok -- Console kit walked off a cliff again.

 Attached is "strace -p 4843" and "grep -A5 -B5 console-kit-dae messages"

iponeverything (cookema) wrote :

grep -A5 -B5 console-kit-dae messages

iponeverything (cookema) wrote :

Here is another strace. It bombs in the same place.

/var/run/ConsoleKit/database to follow.

Hope this helps. -- Let me know if there is something that you would like me try.

iponeverything (cookema) wrote :

Since this is were it was when it barfed, I thought that you might like to have as a souvenir.

/var/run/ConsoleKit/database

iponeverything (cookema) wrote :

Sorry the last /var/run/ConsoleKit/database uploaded was created after the crash. I think this is the one that it was writing at the time

/var/run/ConsoleKit/database

Hi,

I believe I have found the cause of this bug.

It is to do with the interaction of the watch removal caused by
the dbus name change and the inotify IN_IGNORED notification.

Both will trigger a watch to be removed, so there can potentially
be two removals attempted. However, this is not checked for in the
code. Even if it was, it would still be potentially racy (even
if the hole would be a lot smaller), so I think the inotify
removals should be pushed in to the main thread, like the rest of
the responses to inotify events, where they can safely check
whether the watch has already been removed.

I will try and write a patch for this now.

Thanks,

James

iponeverything (cookema) wrote :

I have a recipe.

After much observation and trial & error. I have a recipe to get console-kit-daemon to segfault every time on my machine. First off -- use su and not sudo for what follows.

open an xterm, su and attach a an strace to it just so you can rejoice when you see it fall down.
strace -p <pid> ; cp /var/run/Consolekit/* /tmp

open another xterm -- do not su
- from this xterm open additional xterm's like so: xterm -geometry 20x10 &
  open 9 or 10 of them.

- now go thru and "su" in all of the open xterms.

- Next go thru and close each window one by one by using the close button in the upper right corner -- by the time you get to the forth or fifth window -- console-kit-daemon will have segfault'ed for you.

oh joy!

James Westby (james-w) wrote :

Hi,

Thanks for the information, it has been very helpful. I can reproduce the
problem with your method.

It seems that /var/run/ConsoleKit/database is a red-herring. I just reproduced
it with the function that writes that and does the rename that appears
at the end of the strace with an empty function and still get the crash. It
obviously crashes in whatever it does after writing the file.

Your steps to reproduce suggest this is an issue with removing sessions.

remove_session_for_cookie appears in the stacktrace, and contains a
call to ck_manager_dump just before calling g_object_unref which
is in the stacktrace one level lower than the remove_session_for_cookie
call.

The g_object_unref is of "orig_session", which has this comment where
it is retrieved:

        /* Must keep a reference to the session in the manager until
         * all events for seats are cleared. So don't remove
         * or steal the session from the master list until
         * it is removed from all seats. Otherwise, event logging
         * for seat removals doesn't work.
         */

The g_object_unref calls ck_session_finalize which in turn calls
session_remove_activity_watch, which ends up at file_monitor_remove_notify.
The notify is not NULL as was previously thought, as that would lead to
a segfault much earlier.

It first looks up the watch in it's global list of watches, and finds it. It then steals
it out of the hash. It the removes this watch from the list of watches for the same
path. If that list is zero, which it is in this case, it calls file_monitor_remove_watch
with the watch.

The stacktrace shows the watch is NULL in this case.

I think the problem is due to inotify causing things to happen in a separate thread.

The inotify response function makes sure to use idle_add to instruct the main thread
to act on the information, except if the inotify even has IN_IGNORED, indicating the
watch was removed (either with inotify_rm_watch or because the file was deleted).
In that case the code will call file_monitor_remove_watch from the other thread.
This is the same function that we see causing issues in the stacktrace.

As it is /dev/tty9 or similar that is being watched it is unlikely to have been removed,
and so we can assume it is inotify_rm_watch that is causing the event I believe.
The only caller of this is monitor_release_watch, whose only caller (that isn't tearing
down everything) is file_monitor_remove_watch.

I'm not clear why this doesn't lead to an infinite loop, except that an IN_IGNORED may
not be generated for every call to inotify_rm_watch.

I am going to try and debug this a bit further and test some patches to fix it.

Thanks,

James

James Westby (james-w) wrote :

Ah, sorry, I said "The notify is not NULL as was previously thought", but
that's not what Martin said. He said the watch was NULL, which is what
I said as well.

Thanks,

James

James Westby (james-w) wrote :

Ok, a couple more things.

g_hash_table_remove can take a NULL key I believe, so that's why the
patch didn't work.

I was wrong, it is not monitoring /dev/tty9, it is monitoring /dev/pts/4 etc.,
which are added and removed, so the IN_IGNORED may well be triggered
by removing the file being watched.

Thanks,

James

James Westby (james-w) wrote :

Hi,

I believe I understand the bug now.

At heart this is indeed a race condition between the removing of the
seat due to the dbus name owner change (the path in the stacktrace)
and due to the IN_IGNORED event from inotify.

When a session is closed it will trigger the dbus event, which will remove
the watch, and call inotify_rm_watch. This may stop the inotify event from
coming. There is also a point where inotify events for paths that aren't
watched are filtered out (the reason why it doesn't cause an infinite loop),
so if the watch is removed before the inotify event hits that point it will
also be ok.

If however the inotify event arrives early enough that it makes it past this
then it will cause the watch to be removed as well.

The remove code does

           g_hash_table_remove (monitor->priv->path_to_watch,
                             watch->path);

which will crash if watch->path is NULL (I was wrong in my last comment,
as this uses a g_str_hash, not a g_direct_hash). It then calls

           monitor_release_watch (monitor, watch);

which does

           g_free (watch->path);
           watch->path = NULL;

so if one remove call makes it to that point before the other makes it
to the g_hash_table_remove call then it will cause the crash.

We can check for this before calling g_hash_table_remove, which makes
the time when it is possible to trigger the bug very short, but it's still
possible, so we should look for a better solution.

I think having the two threads do the removal is going to be problematic,
so we should push the removal from the inotify thread in to the main
thread. It can then check whether it has been removed yet without
being racy.

Thanks,

James

Created an attachment (id=19742)
Patch to serialise watch removals

Hi,

Here is my proposed patch to fix this issue.

The first thing it does is move the removal of a watch
caused by an IN_IGNORED event in to the main thread, where
they will be serialised with the other removals, preventing
race conditions.

Then, for the case where two removals are generated it adds
two checks for the watch being already removed. It firstly
checks for watch->wd == -1 in file_monitor_remove_watch,
which will be True if the watch is already removed, and handles
the case when the inotify removal ends up after the dbus one.

It then checks notify->watch->notifies for the case when the
dbus removal ends up after the inotify one.

The testcase I was using for this is thanks to "iponeverything"
in the Ubuntu bug. It is opening 10 xterms and doing "su -" in
each (i.e. opening lots of sessions), and then closing them all
one by one using the close button of the WM.

I am going to seek an upload of this patch in to Ubuntu, as the
bug is currently one of the most reported there, and the crashes
mean people lose desktop functionality until they restart their
session.

Thanks,

James

iponeverything (cookema) wrote :

Thanks James and Martin -

"I believe I understand the bug now."

That is music my ears. Reading through your post, I can see why I am not developer. Debugging a subtle race condition like this would make my head explode.

Hi,

Unfortunately my patch does not completely solve the problem.

After applying the patch it is still possible to get the crash
with the same recipe (I think I was closing them too quickly,
so the inotify watches hadn't been added yet).

The stacktrace shows that the crash is in g_slist_remove in
ck_file_monitor_remove_notify. My patch adds a guard for
NULL watch->notifies, but the value is not NULL, but neither
is it a valid pointer (0x10 in case I am looking at).

The inotify IN_IGNORED event does cause a removal, and it does
set watch->notifies to NULL prior to this, but by the time
the other removal event arrives the value is 0x10, which I
don't understand.

Any insights you may have would be appreciated.

Thanks,

James

On Sun, 2008-10-19 at 07:17 +0000, iponeverything wrote:
> Thanks James and Martin -
>
> "I believe I understand the bug now."
>
> That is music my ears. Reading through your post, I can see why I am
> not developer. Debugging a subtle race condition like this would make my
> head explode.

It seems I'm not a developer either :-)

I cooked up a patch to fix it, but I still get segfaults. I'll continue
debugging this today.

Thanks,

James

James Westby (james-w) wrote :

Hi,

Here is a patch that I think should fix this. Please review
for sponsorship in to Intrepid.

If the patch of a patch is a bit hard to read please refer to
the upstream bug where you can see the new patch and
a description of what it does.

Thanks,

James

Changed in consolekit:
importance: Medium → High
James Westby (james-w) wrote :

Hi,

I have put packages in my PPA if you would like to test the fix.

https://edge.launchpad.net/%7Ejames-w/+archive/+files/consolekit_0.2.10-1ubuntu8+ppa2_i386.deb
(i386)

https://edge.launchpad.net/%7Ejames-w/+archive/+files/consolekit_0.2.10-1ubuntu8+ppa2_amd64.deb
(amd64)

It won't fix all consolekit crashes, but it hopefully fixes this one.

Thanks,

James

tdflanders (thomasdelbeke) wrote :

thomas@thomas-laptop:~$ apt-cache search consolekit | grep "dbg"
libpolkit-dbus2-dbgsym - debug symbols for package libpolkit-dbus2
libck-connector0-dbgsym - debug symbols for package libck-connector0
libpam-ck-connector-dbgsym - debug symbols for package libpam-ck-connector
consolekit-dbgsym - debug symbols for package consolekit
thomas@thomas-laptop:~$ sudo apt-get source consolekit-dbgsym
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to find a source package for consolekit-dbgsym
thomas@thomas-laptop:~$ sudo apt-get install consolekit-dbgsym
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.

Since you only requested a single operation it is extremely likely that
the package is simply not installable and a bug report against
that package should be filed.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
  consolekit-dbgsym: Depends: consolekit (= 0.2.10-1ubuntu8) but 0.2.10-1ubuntu8+ppa2 is to be installed
E: Broken packages
thomas@thomas-laptop:~$ sudo apt-get build-dep consolekit-dbgsym
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to find a source package for consolekit-dbgsym
thomas@thomas-laptop:~$

iponeverything (cookema) wrote :

:( so sad. I am still able to easily reproduce the segfault with 0.2.10-1ubuntu8+ppa2_i386.deb installed.

I backed out to 0.2.10-1ubuntu8 and rebuilt the deb with your patch applied and got the same result.

If its any consolation, I made it to seven windows before the segfault. I will continue to test it to see how it does under normal use. Let me know if there is more information that I should provide.

Best regards!

James Westby (james-w) wrote :

On Sun, 2008-10-19 at 15:47 +0000, iponeverything wrote:
> :( so sad. I am still able to easily reproduce the segfault with
> 0.2.10-1ubuntu8+ppa2_i386.deb installed.
>
> I backed out to 0.2.10-1ubuntu8 and rebuilt the deb with your patch
> applied and got the same result.
>
> If its any consolation, I made it to seven windows before the segfault.
> I will continue to test it to see how it does under normal use. Let me
> know if there is more information that I should provide.

Please grab a stacktrace of the failure.

To do this get a root shell and run

killall console-kit-daemon
gdb console-kit-daemon
set args --debug --no-daemon
set pagination 0
run

then reproduce the problem, and when it crashes type "bt" in to gdb,
which should give you a stacktrace. Then please attach the whole output
of the gdb session to this bug.

Thanks,

James

James Westby (james-w) wrote :

Hmm, I see it too now.

The crash is in the g_slist_remove() call, which I added a guard
around. This suggest that there is still a race condition. Maybe I
misunderstood what the idle_add stuff does, perhaps the dbus
events aren't running in the main thread either.

Thanks,

James

iponeverything (cookema) wrote :

Here is stacktrace and gdb output from my rebuild of the deb.

On Sun, Oct 19, 2008 at 8:30 PM, James Westby <email address hidden> wrote:
> On Sun, 2008-10-19 at 15:47 +0000, iponeverything wrote:
>> :( so sad. I am still able to easily reproduce the segfault with
>> 0.2.10-1ubuntu8+ppa2_i386.deb installed.
>>
>> I backed out to 0.2.10-1ubuntu8 and rebuilt the deb with your patch
>> applied and got the same result.
>>
>> If its any consolation, I made it to seven windows before the segfault.
>> I will continue to test it to see how it does under normal use. Let me
>> know if there is more information that I should provide.
>
> Please grab a stacktrace of the failure.
>
> To do this get a root shell and run
>
> killall console-kit-daemon
> gdb console-kit-daemon
> set args --debug --no-daemon
> set pagination 0
> run
>
> then reproduce the problem, and when it crashes type "bt" in to gdb,
> which should give you a stacktrace. Then please attach the whole output
> of the gdb session to this bug.
>
> Thanks,
>
> James
>
> --
> console-kit-daemon crashed with SIGSEGV in g_str_hash()
> https://bugs.launchpad.net/bugs/269651
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in ConsoleKit: Confirmed
> Status in "consolekit" source package in Ubuntu: In Progress
> Status in consolekit in Ubuntu Hardy: New
> Status in consolekit in Ubuntu Intrepid: In Progress
>
> Bug description:
> Binary package hint: consolekit
>
> i'm playing music with rhythymbox.. and the console- kit is falling down
>
> ProblemType: Crash
> Architecture: i386
> CrashCounter: 1
> DistroRelease: Ubuntu 8.10
> ExecutablePath: /usr/sbin/console-kit-daemon
> NonfreeKernelModules: ath_hal
> Package: consolekit 0.2.10-1ubuntu4
> ProcAttrCurrent: unconfined
> ProcCmdline: /usr/sbin/console-kit-daemon
> ProcEnviron:
>
> Signal: 11
> SourcePackage: consolekit
> StacktraceTop:
> g_str_hash () from /usr/lib/libglib-2.0.so.0
> ?? () from /usr/lib/libglib-2.0.so.0
> ?? ()
> ?? ()
> ?? ()
> Title: console-kit-daemon crashed with SIGSEGV in g_str_hash()
> Uname: Linux 2.6.27-2-generic i686
> UserGroups:
>

James Westby (james-w) wrote :

Hi,

Adding some trace statements in shows that the following happens

  * inotify IN_IGNORE event arrives. It is added to the removals queue.
  * Removals queue is processed, and the watch is removed, with watch->notifies
     set to NULL.
  * dbus removal event is triggered.
  * watch->notifies is not NULL, but not a valid pointer either, in the case I looked
     at it was 0x10.

I'm not sure what will be changing the value of watch->notifies in between the
two points.

Thanks,

James

Created an attachment (id=19760)
Serialise removals, and avoid using freed data caused by removals

Hi,

Further debugging revealed this issue with the previous patch.
While the removals were serialised each notify still contained
a reference to a watch, which will have been freed if a removal
was already triggered, causing a segfault.

I changed the code to also loop through watch->notifies when
removing the watch due to inotify, and NULL each notify->watch
reference, the code then checks this before trying to delete
the watch itself if asked to remove the notify.

In order to prevent other race conditions in this area I also
made the inotify code not pass a watch to emit_events_in_idle,
as the watch may get freed in the meantime. It instead passes
the wd and the emit loop looks up the watch, discarding the
event if the watch has been removed.

I did however leave in the code that checks for a removed watch
before doing anything with inotify, as I hoped that this would
just optimise this case.

Please review the attached patch for inclusion. I am going to
request testing of the patch in the Ubuntu bug, and if that
reveals no further problems seek an upload of the patch.

Thanks,

James

William T. Mann (wtmann) wrote :

This is happening to me with the latest update to Intrepid with my laptop in an idle state: no user programs running (other than the system ones) at all!

William

James Westby (james-w) wrote :

Hi,

After some more debugging and some help on #ubuntu-devel I think
I have worked out what I was missing in my previous patch. Details
are on the upstream bug for the curious.

I have built new packages with the updated patch (attached), please
test them

https://edge.launchpad.net/%7Ejames-w/+archive/+files/consolekit_0.2.10-1ubuntu8+ppa3_i386.deb
(i386)

https://edge.launchpad.net/%7Ejames-w/+archive/+files/consolekit_0.2.10-1ubuntu8+ppa3_amd64.deb
(amd64)

Thanks,

James

iponeverything (cookema) wrote :

:) thank you James. I think you have squashed this one.

On Mon, 2008-10-20 at 16:56 +0000, iponeverything wrote:
> :) thank you James. I think you have squashed this one.
>

I take it you tested it and it works?

Thanks,

James

tdflanders (thomasdelbeke) wrote :

I installed your repository ppa into my /etc/apt/sources.list and I tried to induce a crash with ipons method. It failed. I did sudo gdb and attached pidof console-kit-daemon, but it never crashed. I also tried the following commands:
$ sudo su ; xterm ; xterm ; xterm ; xterm ; ...
$ sudo su ; xterm | xterm | xterm | xterm

No crash either. I will try later with parallel root and user terminals, to induce a crash.

Cheers,

Thomas

Jean.c.h (slug71) wrote :

Installed Google Earth in Terminal. When i closed Terminal after installation got this report.

For the record, this is a recipe to reliably reproduce the crash, thanks to "iponeverything" in the Ubuntu bug (slightly simplified):

--------- 8< -----------
- Attach a an strace to CK so you can see when you see it fall down:
  sudo strace -p `pidof console-kit-daemon`

- Open additional xterm's like so: xterm -geometry 20x10 &
  open 9 or 10 of them.

- now go through and use "su - someuser" in all of the open xterms.

- Next go thru and close each xterm one by one by using the close button in the upper right corner -- by the time you get to the forth or fifth window -- console-kit-daemon will have segfault'ed for you.
--------- 8< -----------

I tried for half an hour to turn this into a noninteractive test script, but failed unfortunately.

I tested CK with James' patch, and it works very well. I don't get crashes any more, and both X and pam-ck-connector based sessions work as usual, and even hammering it with something like

for i in `seq 50`; do
    ck-launch-session sleep 5 &
    sleep 0.1
done

works correctly.

Yes, I have tested the heck out of it using the recipe and I am unable
reproduce the condition. I am very impressed with the patch, you've
introduced a lot of new code.

BTW -- thanks for the honorable mention in the upstream.

On Mon, Oct 20, 2008 at 11:56 PM, James Westby
<email address hidden> wrote:
> On Mon, 2008-10-20 at 16:56 +0000, iponeverything wrote:
>> :) thank you James. I think you have squashed this one.
>>
>
> I take it you tested it and it works?
>
> Thanks,
>
> James
>
> --
> console-kit-daemon crashed with SIGSEGV in g_str_hash()
> https://bugs.launchpad.net/bugs/269651
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in ConsoleKit: Confirmed
> Status in "consolekit" source package in Ubuntu: In Progress
> Status in consolekit in Ubuntu Hardy: New
> Status in consolekit in Ubuntu Intrepid: In Progress
>
> Bug description:
> Binary package hint: consolekit
>
> i'm playing music with rhythymbox.. and the console- kit is falling down
>
> ProblemType: Crash
> Architecture: i386
> CrashCounter: 1
> DistroRelease: Ubuntu 8.10
> ExecutablePath: /usr/sbin/console-kit-daemon
> NonfreeKernelModules: ath_hal
> Package: consolekit 0.2.10-1ubuntu4
> ProcAttrCurrent: unconfined
> ProcCmdline: /usr/sbin/console-kit-daemon
> ProcEnviron:
>
> Signal: 11
> SourcePackage: consolekit
> StacktraceTop:
> g_str_hash () from /usr/lib/libglib-2.0.so.0
> ?? () from /usr/lib/libglib-2.0.so.0
> ?? ()
> ?? ()
> ?? ()
> Title: console-kit-daemon crashed with SIGSEGV in g_str_hash()
> Uname: Linux 2.6.27-2-generic i686
> UserGroups:
>

iponeverything (cookema) wrote :

I have also discovered that 64 50x2 -bg black -fg green xterms look pretty cool.

If you find yourself Kabul, I'll buy you a good cup of coffee.

Best Regards,
Martin Cooke

James Westby (james-w) wrote :

On Tue, 2008-10-21 at 03:23 +0000, iponeverything wrote:
> I have also discovered that 64 50x2 -bg black -fg green xterms look
> pretty cool.
>
> If you find yourself Kabul, I'll buy you a good cup of coffee.

I'll hold you to that :-)

Thanks for testing, and huge thanks for the way to reproduce, that
was what made this soluble.

Thanks,

James

Martin Pitt (pitti) wrote :

I tried for half an hour to create a noninteractive test script for that, but failed unfortunately.

Anyway, I gave James' consolekit a real thorough hammering, and it works very well for me. Thanks a lot!

James Westby ha scritto:
> On Mon, 2008-10-20 at 16:56 +0000, iponeverything wrote:
>
>> :) thank you James. I think you have squashed this one.
>>
>>
>
> I take it you tested it and it works?
>
> Thanks,
>
> James
>
>
oh yes man, it seems to work now...really good job

iponeverything (cookema) wrote :

Maybe I have just been lucky, but is it possible that James' fix for console-kit-daemon also fixed the segfaults that I was getting in v86d. See Bug #258031

I went from getting these quite regularly, to not at all since consolekit_0.2.10-1ubuntu8+ppa3_i386.

EmyrB (emyr) wrote :

Keeps happening to me on 2 different PCs with totally different hardware.

Manfred Georg (tharkban) wrote :

not doing anything in particular, was installing software. No music, no active browsing.

Martin Pitt (pitti) wrote :

I uploaded that to the intrepid queue now. I gave it a really good testing beating, and it works very well for me. Some other folks here tested it, too, thus I'm eager to get it into the final release.

Thanks so much, James!

Changed in consolekit:
status: In Progress → Fix Committed

unsubscribe

On Thu, Oct 23, 2008 at 10:05 AM, Martin Pitt <email address hidden>wrote:

> I uploaded that to the intrepid queue now. I gave it a really good
> testing beating, and it works very well for me. Some other folks here
> tested it, too, thus I'm eager to get it into the final release.
>
> Thanks so much, James!
>
> ** Changed in: consolekit (Ubuntu Intrepid)
> Status: In Progress => Fix Committed
>
> --
> console-kit-daemon crashed with SIGSEGV in g_str_hash()
> https://bugs.launchpad.net/bugs/269651
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

Joachim Kainz (joachim-kainz) wrote :

George,

Wouldn't it be great, if such an "unsubscribe" command would actually
exist on Launchpad? Can you please go to to
https://bugs.launchpad.net/launchpad-registry/+bug/284667 and report
that this button affects you too?

Best regards,

Joachim

On Thu, 2008-10-23 at 19:04 +0000, george wrote:

> unsubscribe
>
>
> On Thu, Oct 23, 2008 at 10:05 AM, Martin Pitt
> <email address hidden>wrote:
>
> > I uploaded that to the intrepid queue now. I gave it a really good
> > testing beating, and it works very well for me. Some other folks here
> > tested it, too, thus I'm eager to get it into the final release.
> >
> > Thanks so much, James!
> >
> > ** Changed in: consolekit (Ubuntu Intrepid)
> > Status: In Progress => Fix Committed
> >
> > --
> > console-kit-daemon crashed with SIGSEGV in g_str_hash()
> > https://bugs.launchpad.net/bugs/269651
> > You received this bug notification because you are a direct subscriber
> > of a duplicate bug.
> >
>

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package consolekit - 0.2.10-1ubuntu9

---------------
consolekit (0.2.10-1ubuntu9) intrepid; urgency=low

  * debian/patches/10-file_monitor_remove_watch_crash.patch:
    - Move watch removals triggered by inotify in to the main thread
      so that they do not race with the removals triggered by dbus.
    - Don't try and remove watches that have already been removed.
    - When removing a watch triggered by inotify also invalidate pointers
      to it from all notifies that use it.
    - Ubuntu: LP: #269651
    - Upstream: https://bugs.freedesktop.org/show_bug.cgi?id=18046

 -- James Westby <email address hidden> Sun, 19 Oct 2008 03:18:24 +0100

Changed in consolekit:
status: Fix Committed → Fix Released
Felix Yan (felixonmars) wrote :

Thanks.
Now upgrading to the new version...

tdflanders (thomasdelbeke) wrote :

James,

I get this from following your link. Is it related to the ca-certificates problem (# 244412)?

Secure Connection Faile

bugs.freedesktop.org uses an invalid security certificate.

The certificate is not trusted because the issuer certificate is unknown.

(Error code: sec_error_unknown_issuer)

    * This could be a problem with the server's configuration, or it could be someone trying to impersonate the server.

    * If you have connected to this server successfully in the past, the error may be temporary, and you can try again later.

tdflanders (thomasdelbeke) wrote :

Before your patch. Now my final test on this bug with kernel 2.6-27.14 and all updates.

tdflanders (thomasdelbeke) wrote :

OK,

So I cannot reproduce this one after your patch:

thomas@thomas-laptop:~$ lsb_release -rd ; uname -a ; apt-cache policy linux consolekit xterm linux-source-2.6.27 gnome-terminal apport-gtk sudo libpam
Description: Ubuntu 8.10
Release: 8.10
Linux thomas-laptop 2.6.27-7-generic #1 SMP Fri Oct 24 06:42:44 UTC 2008 i686 GNU/Linux
linux:
  Installed: 2.6.27.7.10
  Candidate: 2.6.27.7.10
  Version table:
 *** 2.6.27.7.10 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
        500 http://ie.archive.ubuntu.com intrepid/restricted Packages
        100 /var/lib/dpkg/status
consolekit:
  Installed: 0.2.10-1ubuntu9
  Candidate: 0.2.10-1ubuntu9
  Version table:
 *** 0.2.10-1ubuntu9 0
        500 http://ie.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
     0.2.10-1ubuntu8+ppa3 0
        500 http://ppa.launchpad.net intrepid/main Packages
     0.2.10-1ubuntu8 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
xterm:
  Installed: 235-1ubuntu1
  Candidate: 235-1ubuntu1
  Version table:
 *** 235-1ubuntu1 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
        500 http://ie.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
linux-source-2.6.27:
  Installed: 2.6.27-7.14
  Candidate: 2.6.27-7.14
  Version table:
 *** 2.6.27-7.14 0
        500 http://ie.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
     2.6.27-7.12 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
gnome-terminal:
  Installed: 2.24.1-0ubuntu1
  Candidate: 2.24.1-0ubuntu1
  Version table:
 *** 2.24.1-0ubuntu1 0
        500 http://ie.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
     2.24.0-0ubuntu2 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
apport-gtk:
  Installed: 0.119
  Candidate: 0.119
  Version table:
 *** 0.119 0
        500 http://ie.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
     0.117 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
sudo:
  Installed: 1.6.9p17-1ubuntu2
  Candidate: 1.6.9p17-1ubuntu2
  Version table:
 *** 1.6.9p17-1ubuntu2 0
        500 cdrom://[APTonCD for ubuntu intrepid - i386 (2008-10-21 06:52) DVD1] Packages
        500 http://ie.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
libpam:
  Installed: (none)
  Candidate: (none)
  Version table:
thomas@thomas-laptop:~$

Cheers,

Thomas

pipegeek (pipegeek) wrote :
Download full text (7.3 KiB)

I think this problem might not be resolved quite yet, as I can reproduce it, and I have the patched version of consolekit (0.2.10-1ubuntu9) installed. It doesn't happen right away upon boot, but it only takes a few minutes of running the system under load (8 kvm processes; 8 cpus), before console-kit-daemon segfaults. From then on (until the system is next rebooted), logging in on the console as root and then logging out again causes console-kit-daemon to segfault, and that console does not then display a login prompt.

The system in question:

lsb_release -rd ; uname -a ; apt-cache policy linux consolekit linux-image-server gnome-terminal apport-gtk sudo libpam0g
Description: Ubuntu 8.10
Release: 8.10
Linux vulcan 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64 GNU/Linux
linux:
  Installed: (none)
  Candidate: 2.6.27.9.13
  Version table:
     2.6.27.9.13 0
        500 http://debian intrepid-security/restricted Packages
        500 http://debian intrepid-updates/restricted Packages
     2.6.27.7.11 0
        500 http://debian intrepid/restricted Packages
consolekit:
  Installed: 0.2.10-1ubuntu9
  Candidate: 0.2.10-1ubuntu9
  Version table:
 *** 0.2.10-1ubuntu9 0
        500 http://debian intrepid/main Packages
        100 /var/lib/dpkg/status ...

Read more...

Jean.c.h (slug71) on 2008-12-30
Changed in consolekit:
status: New → Confirmed
Ian! D. Allen (idallen) wrote :

Will the consolekit fix get applied to Hardy - the Long Term Support release?
Hardy only has version 0.2.3-3ubuntu5 of consolekit available.

I'm not even running Gnome (I use vtwm) and all it takes is one little inotify set to cause consolekit to fault:

kernel: [1065490.220520] console-kit-dae[7503]: segfault at 00000000 eip b7eca577 esp bf8dc344 error 4

Sorry for the really long delay. And thanks for looking into this!

I think your analysis is correct. In retrospect I should have designed this to use ref-counted objects. This patch seems like a good solution for now. Longer term I'd like to try to add ACCESS inotify support to GIO and use GFileMonitor.

I've committed this to master with a few small changes such as casting the wd int into a pointer and vice versa.

I'd appreciate any testing you can do of the committed patch. I'll be rolling a release with it shortly.

Thanks again.

Thanks, William. For the testing record, we have had this patch applied for four months now, in a stable release, and haven't heard bad things about it.

Changed in consolekit:
status: Confirmed → Fix Released

I was also curious if Hardy got the consolekit fix. Have been receiving this report numerous times within the last few weeks. This is with Hardy i386, all updates applied as of 22-02-09.

Feb 22 18:23:04 fileserver kernel: [1923639.393857] console-kit-dae[13814]: segfault at 00000000 eip b7ea3577 esp bfc06e74 error 4

Andres Mujica (andres.mujica) wrote :

i believe not, check the related bug #196724, however is not clear if the impact of this bug is enough for a SRU (according to that bug report)

I do agree that a segfault's message free dmesg is a lot better than the current one.

The bug still happens for me with

Version: 0.2.10-1ubuntu10
Architecture: amd64

on a 8 core (Dual E5405) box. The box is running a 2.6.27 kernel build with -j16 in a loop, and 8 instances of cpuburn-in (which is a IA32 binary)

[60783.680922] console-kit-dae[30079]: segfault at 120 ip 00007f582f9d6e09 sp 0000000041963090 error 4 in libglib-2.0.so.0.1800.2[7f582f9aa000+c3000]
[61924.620762] console-kit-dae[20218]: segfault at 1e8 ip 00007f6ecd472e09 sp 00000000403a5090 error 4 in libglib-2.0.so.0.1800.2[7f6ecd446000+c3000]
[62885.103384] console-kit-dae[23811]: segfault at 1e8 ip 00007fa722823e09 sp 00000000403f0090 error 4 in libglib-2.0.so.0.1800.2[7fa7227f7000+c3000]
[63484.581385] console-kit-dae[21449]: segfault at 1c8 ip 00007f789da86e09 sp 00000000413a4090 error 4 in libglib-2.0.so.0.1800.2[7f789da5a000+c3000]
[65885.274765] console-kit-dae[9799]: segfault at 1c8 ip 00007f5f7310ee09 sp 00000000414db090 error 4 in libglib-2.0.so.0.1800.2[7f5f730e2000+c3000]

hoc (highoncoffee) wrote :

Thx for the effort so far, but the bug still happens on my system. I am running Intrepid (KDE4+compiz) on an Asus V2s laptop.

Architecture: amd64
Version: 0.2.10-1ubuntu9

Now that I read the above reproduction-recipe, I am fairly sure I am experiencing a similar bug.

The crash happens to me (up to once every hour) when I am using eclipse and close a dialog window (by pressing ESC), e.g. the file-search dialog or team>commit-dialog. I do not know how to reproduce this 100%, because it doesn't always happen, but that is definitely the trigger for the crash.

I do not find segfaults in my logs, but i do always have entries in my syslog before X crashes, indicating console-kit-daemon is the culprit:

Thx for the effort so far, but the bug still happens on my system. I am running Intrepid (KDE4+compiz) on an Asus V2s laptop.

Architecture: amd64
Version: 0.2.10-1ubuntu9

Now that I read the above reproduction-recipe, I am fairly sure I am experiencing a similar bug.

The crash happens (sometimes) when I am using eclipse and close a dialog window (by pressing ESC), not the parent window, e.g. the file-search dialog or team>commit-dialog. I do not know how to reproduce this 100%, because it doesn't always happen, but that is definitely a trigger for the crash.

I do not find segfaults in my logs, but i do always have an entry in my syslog right before X crashes, indicating console-kit-daemon is probably to blame:

Apr 1 11:02:11 nesus console-kit-daemon[5749]: WARNING: Unable to activate console: No such device or address
Apr 1 11:17:47 nesus console-kit-daemon[5749]: WARNING: Unable to activate console: No such device or address
Apr 1 13:22:27 nesus console-kit-daemon[5749]: WARNING: Unable to activate console: No such device or address
Apr 1 16:31:12 nesus console-kit-daemon[5749]: WARNING: Unable to activate console: No such device or address
Apr 1 16:47:35 nesus console-kit-daemon[5749]: WARNING: Unable to activate console: No such device or address
Apr 2 01:27:20 nesus console-kit-daemon[5763]: WARNING: Unable to activate console: No such device or address
Apr 2 11:56:54 nesus console-kit-daemon[5696]: WARNING: Unable to activate console: No such device or address
Apr 2 12:17:06 nesus console-kit-daemon[5696]: WARNING: Unable to activate console: No such device or address
Apr 2 12:17:06 nesus console-kit-daemon[5696]: GLib-GObject-WARNING: instance of invalid non-instantiatable type `(null)'
Apr 2 12:17:06 nesus console-kit-daemon[5696]: GLib-GObject-CRITICAL: g_signal_emit_valist: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed
Apr 2 12:17:06 nesus console-kit-daemon[5696]: GLib-GObject-CRITICAL: g_object_unref: assertion `G_IS_OBJECT (object)' failed

Every 'Unable to activate console'-line is a crash. The Glib-lines only appeared on the last crash.

As you can see I had a pretty disruptive April Fools day. :)

If you need more details, I'll be happy to provide them.

hoc (highoncoffee) wrote :

Hmm, after having another coffee, it seems I failed in tracing the logs. The GLib entries are unrelated, so you can ignore the previous syslog-entries. Cant edit the comment, so sorry for that spam.

Basically every crash gives two syslog-lines like the following:

Apr 2 11:56:54 nesus console-kit-daemon[5696]: WARNING: Unable to activate console: No such device or address
Apr 2 11:56:54 nesus kdm[5896]: X server for display :0 terminated unexpectedly

I did not find any other info related to the crashes in other logs.

Gavin McCullagh (gmccullagh) wrote :

I'm getting segfaults like this on a server running hardy:

Jul 9 04:44:23 kevlar kernel: [590776.331930] console-kit-dae[15552]: segfault at 00000000 eip b7ecf597 esp bfb325a4 error 4

The machine is basically just in use as a shorewall/iptables based firewall.

Gavin

Sven Herzberg (herzi) wrote :

Is there any workaround known for hardy? Should I install conslekit from Intrepid?

salt (d-salt) wrote :

--------------------- Kernel Begin ------------------------

 WARNING: Segmentation Faults in these executables
    [1906704.057551] console-kit-dae : 1 Time(s)

 ---------------------- Kernel End -------------------------
uname -a
Linux servpII 2.6.24-24-server #1 SMP Fri Sep 18 17:24:10 UTC 2009 i686 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 8.04.4 LTS
Release: 8.04
Codename: hardy

Changed in consolekit:
importance: Unknown → Medium
Changed in consolekit:
importance: Medium → Unknown
Changed in consolekit:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.