unprotected check ... clear race on caught signals count in nih_signal_poll() may cause deadlocks

Bug #518921 reported by Dave Martin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libnih
Triaged
Medium
Unassigned
libnih (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

Ubuntu lucid
libnih 1.0.1-1

nih_signal_handler() increments signals_caught[signum] per signal; nih_signal_poll() is responsible for checking and resetting the count.

However, a signal can arrive after nih_signal_poll() decides not to run the handler (because signals_caught[signum] is 0) but before the signals_caught counts are reset at the end of the function. (This is the classic atomicity problem when waiting for signals --- see pselect(2) for background.)

The signal pipe will have been written, which does ensure that nih_main_loop() is guaranteed to wake up again on the next round, but because signals_caught[signum] remains zero for the affected signal, that instance of the signal will be silently ignored when nih_signal_handler() is next called. This can lead to a deadlock if notification of the signal is required for the program to make progress (such as waiting for SIGCHLD to arrive).

The attached patch may serve as a workaround, but it's not been extensively tested.

I know of no real-world deadlock caused by this bug, but I spotted it when investigating a separate deadlock issue.

This bug was found while investigating the following issue, but is probably not directly related:
 * https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/518937 (ply_event_loop_process_pending_events can block, causing mountall to deadlock)

Revision history for this message
Dave Martin (dave-martin-arm) wrote :
description: updated
description: updated
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Yeah, I've known about this race for ages; it's the reason for the "switch to signalfd()" TODO item. I actually have a fair rewrite of the libnih core code to switch to epoll(), timerfd, signalfd, etc. all pending - just haven't had time to sit down and merge it all in yet.

(And this bug doesn't affect Upstart in any important way, so hasn't got priority :p)

Changed in libnih (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Dave Martin (dave-martin-arm) wrote :

Oh, OK... I wasn't sure whether this code was actually used yet anyway; just something I spotted while looking at the code.

Thinking about my fix, it is a bit heavyweight; signalfd is probably a nicer approach overall.

Changed in libnih:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Do the same as Upstart; close the Ubuntu task for "upstream code bugs" and leave the upstream task open

Changed in libnih (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Mike Frysinger (vapier) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.