Excerpts from codewarrior's message of Tue Apr 26 14:34:44 UTC 2011:
> I believe that this problem is caused by the /etc/init.d/sendsigs script
> not correctly waiting for terminating Upstart jobs.
> 
> I came across this problem on a Maverick system used for MythTV.  One
> night it was left to transcode and advert flag a movie (lots of disk
> activity) after which it automatically powered off.  The next morning I
> turned on the system and /var didn't mount due to errors.
> 
> I kept an eye on the system thereafter thinking it was a disk fault but
> noticed the same orphaned inode and recovering journal messages on every
> restart.  I found in this case that shutting down mysqld in advance of
> calling poweroff resolved the problems.  However, that made me look
> further and I realised that mysql was only a problem because it took
> several seconds or more to shutdown.  It looked like there was a race
> condition in the shutdown logic.
> 
> On entering runlevel 0 or 6 (halt or reboot), Upstart delivers a TERM
> signal to all processes that should stop at that runlevel.  The service
> doesn't have to terminate immediately - some services like mysql take a
> few seconds to tidy up.  An upstart service can define a kill timeout
> stanza to specify a stop time if it's in excess of 5 seconds.  After
> this time Upstart will deliver a KILL signal.
> 
> The problem is that immediately after Upstart sends the TERM signals it
> starts the rc.conf service to run the SystemV scripts in /etc/rc0.d.
> One particular script, S20sendsigs, is responsible for TERMinating all
> remaining processes.  However, it's logic is to exclude any Upstart job
> and so it can exit believing that all processes are dead.  Then
> /etc/rc0.d/S40umountfs unmounts the fllesystem while there are open
> files and hey-presto orphaned inodes & worse...
> 
> The attached patch makes sendsigs wait for any Upstart jobs that are
> stopping.  This fixes all my file corruption problems and since using it
> I've not had any orphaned inodes.

codewarrior, thanks for taking a stab at this, its something thats on
my TODO list for the next month to solve.

The patch does a nice job of enforcing for upstart jobs the exact same
rules as other processes. On first glance it might look like we'll kill
some processes that we shouldn't, but in fact it does quite a good job
of only TERM/KILL'ing jobs that are already bound for 'stop' status.

It unfortunately causes us to ignore the 'kill timeout' setting that each
job can specify. I'd like to see us respect that and extend sendsigs's
deadline to the maximum of those.

One other minor issue is that it has some races in it. initctl does not
do any kind of locking, it just loops through asking upstart for the
status of each job. So you may end up with jobs that say "start" but,
for whatever reason, are moved to stop right after you asked.

Still that is an imperfection that I think we can live with. This is
also one that we can easily SRU back to lucid/maverick/natty.

I'll target this to Oneiric, and once the fix has dropped there, we
can start the SRU process to all the other active releases.

> 
> This has made be question the correctness and safety of umount.  I would
> have thought that it should work OK in the presence of open files, but
> I'm no ext3 expert.
> 

Its only certain files that cause issues.. open deleted files is one
of the big ones because ext3 needs to reclaim the inode when that file
is closed.

The real issue is that umount's error code is being ignored. One might
argue that it would be better to simply drop into a recovery shell if
these umounts fail. In fact, I will be discussing that possibility in
our UDS session about the shutdown (which we'll be moving to be mostly
upstart based in 11.10).