LightDM leaves a child X instance running when the service is stopped

Bug #789389 reported by Jason Conti
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Light Display Manager
Fix Released
High
Unassigned
lightdm (Ubuntu)
Fix Released
High
Unassigned

Bug Description

When stopping the lightdm service (sudo service lightdm stop), the process doesn't clean up properly after receiving the
TERM signal and a child X instance remains running.

I tracked this down to the signal handling in child-process.c. The signals are properly received by signal_cb and written to
the pipe, but they are never emitted by the handle_signal function.

The logic in this function seems wrong somehow. Will pid ever actually be 0? When upstart sends the TERM signal, the pid
is 1, so the function tries to look that up in the process table, doesn't find it, and so never emits the signal. I am going to attach
a branch that reverses the lookup in handle_signal, first checking the hash table, and if that is NULL, then grabbing the parent.
(Since the pid could be any pid of a kill process trying to terminate lightdm, not just from pid 1)

This is for the package 0.3.5 in oneiric, however I also tested with commit 478 from the trunk.

Tags: patch
Revision history for this message
Pavlo Bohmat (bohm) wrote :

package 0.3.5 in oneiric
At the end of the session unity all the processes are running. With gdm no such problems.

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Thanks Jason, I have merged this change for 0.4.0

Changed in lightdm:
status: New → Fix Committed
importance: Undecided → High
Changed in lightdm (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jason Conti (jconti) wrote :

I noticed the patches in lightdm-0.3.7-0ubuntu2. The TERM signal is working, and lightdm tries to clean up before exiting, but with the processes hash table now owning a reference to each child process, none of them are ever finalized. The user sessions should actually be finalized when they are removed from the hash table in child-process.c/child_process_watch_cb but a small bug in the patch prevents this, because process->priv->pid is set to zero, so g_hash_table_remove tries to remove the key for pid 0, which doesn't exist. That is an easy fix though (just use the pid passed to child_process_watch_cb instead).

The more complicated problem is the xserver child process. When shutting down, this is unrefed by the display and the display-manager, however there is one more reference owned by the hash table.

The first idea I had to fix it was to assume the display stole a reference to the xserver (get rid of the g_object_ref in display.c/display_set_xserver) and update child-process.c/child_process_finalize to use g_hash_table_steal instead (if the extra refs stay, it should really be g_hash_table_steal anyway, because we're in finalize, so don't want to g_object_unref it again). This is kind of awkward though, because then either the display manager or the display assume the hidden reference to the child process in the hash table. (it works though)

The second idea was to get rid of the extra refs for the hash table and delete the lines in child-process.c/child_process_watch_cb that set process->priv->pid to zero and remove it from the hash table. It is really unnecessary because it will be removed from the hash table when the child is finalized anyway. This works but the g_debug statement needs to be removed from from_child_cb as well, because it seems the child process may not be valid in either of this callbacks (possibly more). This seems kind of awkward too.

The third idea I haven't tried yet is to add code to clean out the processes hash table before exiting (possibly in lightdm.c/signal_cb).

Revision history for this message
Jason Conti (jconti) wrote :

Attaching a patch against lightdm-0.3.7-0ubuntu2 that implements the third idea above. When lightdm receives a TERM signal, right before ending the mainloop it calls a cleanup function that just calls g_hash_table_remove_all on the processes table. This works on its own, but in rare instances I was able to trigger a SIGSEGV with from_child_cb still being called on a finalized child process. To fix this, the patch saves the event source ids for g_io_add_watch and g_child_watch_add and removes them when the child process is finalized.

This works for service start/stop but restart still isn't working. This seems to be because the lightdm process is restarted so quickly that the previous X server instance hasn't had time to completely exit, so the new X server instance exits with an error (that is what the log seems to indicate:

[+0.00s] DEBUG: Logging to /var/log/lightdm/lightdm.log
[+0.00s] DEBUG: Starting Light Display Manager 0.3.7, PID=6574
[+0.00s] DEBUG: Loaded configuration from /etc/lightdm.conf
[+0.00s] DEBUG: Loading display default-display
[+0.00s] DEBUG: Logging to /var/log/lightdm/:0.log
[+0.00s] DEBUG: Starting on /dev/tty7
[+0.01s] DEBUG: Launching X Server
[+0.01s] DEBUG: Launching process 6578: /usr/bin/X :0 -auth /var/run/lightdm/authority/0 -nolisten tcp vt7
[+0.01s] DEBUG: Waiting for signal from X server :0
[+0.01s] DEBUG: Process 6578 exited with return value 1
[+0.01s] WARNING: X server exited with value 1)

Not really related to the issue in this bug, but I thought I would mention it here since I stumbled on it, and it is partially related.

tags: added: patch
Revision history for this message
Robert Ancell (robert-ancell) wrote :

Fixed in 0.4.0

Changed in lightdm (Ubuntu):
status: Triaged → Fix Released
Changed in lightdm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.