Sendsigs should always skip fuse filesystems

Bug #151580 reported by Agostino Russo
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysvinit (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

Binary package hint: sysvinit

Userspace filesystems should be handled by umountfs and not by sendsigs. A quick and dirty hack is to have:

echo /sbin/mount.userspace-fs-xyz >> /var/run/sendsigs.omit

Revision history for this message
Agostino Russo (ago) wrote :

The above should be added within sendsigs for each relevant filesystem (particularly ntfs/ntfs-3g).

Revision history for this message
Colin Watson (cjwatson) wrote :

I don't think this is correct, at least without *very* significant care in umountfs. If FUSE processes aren't killed by sendsigs, then umountfs has to take extra care to umount filesystems in the right order. Its current code that sorts by decreasing length of mountpoint name will not be adequate for this, because for example a FUSE program in /usr might be used to implement a filesystem mounted on /a.

This is far too delicate to be appropriate for gutsy.

Revision history for this message
Rudd-O (rudd-o) wrote :

doubt it, because some fuse processes need to stay open till the end (think rootfs in fuse). at the moment I have a very nasty problem that manifests itself in the form of init hanging after "Terminating remaining processes" (while doing the killall5 -15, EVEN THOUGH I am adding my sendsigs omit.d thing, and I have confirmed that the killall5 is actually -o'ing that PID).

Revision history for this message
Rudd-O (rudd-o) wrote :

I think it's killall5 itself that is hanging. If I do an exit 0 before it, everything "works normally" (lots of epic fails when unmounting my zfs filesystems, but at least it doesn't hang)

Revision history for this message
Rudd-O (rudd-o) wrote :

Killall5 indeed is hanging. Here is what is going on:

/etc/init.d/sendsigs invoked during Ubuntu shutdown spoils the fun. Using
strace I have determined that the killall5 -TERM -o <ZFS pid> hangs the
machine because it first kill()'s -STOP the -1 PID (that is, all PIDs),
which obviously causes zfs-fuse to stop. That is the extent of what I could
find out, because strace itself is -STOPped in this situation, and I cannot
obtain any further traces.

How can I immunize a process against SIGSTOP? There is no way in POSIX to
do so, so what is the reasonable workaround?

OK, so what does killall5.c do? It first SIGSTOPs everything. Then it
runs readproc() routine whose purpose is to figure out which processes
there are. SIGSTOP makes lots of sense if you want to have a stable system
go down in an orderly fashion, but it fails with zfs-fuse because zfs-fuse
STOPs, and then readproc() attempts to stat binaries in ZFS filesystems...
and you know where that leads - system hung().

Chicken and egg.

Fortunately, I have a patch. Attached is a new version of the killall5.c file for the package, that does the right thing, by early-CONT'ing the omitted processes. I would have uploaded a debdiff but debdiff just fails (that thing is a hack).

This should let everyone have ntfs-3g as root filesystem with no concerns whatsoever. The only caveat is, the file blindly goes through all the arguments with atoi() instead of properly parsing them with getopt, but hey, I'm not the genius who interspersed getopt parsing with the pidof functionality, so thank you very much this works for me and the other getopt parsing should be excised and separated by the original writer.

Changed in sysvinit:
status: New → Confirmed
Revision history for this message
Rudd-O (rudd-o) wrote :

With the file I just submitted, sendsigs should just work okay and not kill FUSE filesystems, so umountfs or FUSE fs implementors' init.d files can do their job.

Hey, Colin! Long time no talk.

Rudd-O (rudd-o)
Changed in sysvinit:
status: Confirmed → In Progress
Revision history for this message
Matt Zimmerman (mdz) wrote :

Assigning to rudd-o since this is marked in progress and you seem to be the one working on it

Changed in sysvinit:
assignee: nobody → rudd-o
Revision history for this message
Rudd-O (rudd-o) wrote :

Matt, thanks for the cred, but I don't think there's much further to do -- I attached the killall5.c file because it fixes the issue by stopping processes before sigterm, except those who are excluded which are sigconted so the /proc walking algorithm doesn't fail. I'll attach a version that I still have now, which might or might not differ from the version I attached earlier, but which I can confirm used to work when I used Kubuntu (I'm on FC9 now, guys, sorry, but I can still work on Ubuntu).

So what's next? Who is in charge of packaging the fix up and submitting it? I'd appreciate if the changelog mentioned me :-).

Thanks again for your attention to this issue. It's worth mentioning that Fedora simply does not have this sendsigs thing at all, which forces me to edit initscripts to stop my filesystems.

Revision history for this message
Colin Watson (cjwatson) wrote :

I'll have a look at this, thanks.

Changed in sysvinit:
assignee: rudd-o → kamion
Revision history for this message
Eric House (eehouse) wrote :

There's a better fix than sending SIGCONT to omitted processes. The information gathered by stat() is only used in the
case where main_pidof() is the caller of readproc(). So add a boolean to readproc() telling it whether to call stat(), and pass
false from main(). That way fuse processes don't have to identify themselves via sendsigs.

Colin Watson (cjwatson)
Changed in sysvinit (Ubuntu):
status: In Progress → Triaged
Colin Watson (cjwatson)
Changed in sysvinit (Ubuntu):
assignee: Colin Watson (cjwatson) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.