race condition on shutdown (leads to corrupted fs)

Bug #688541 reported by Michael Biebl
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
mysql-5.1 (Ubuntu)
Invalid
High
Clint Byrum
Oneiric
Invalid
Undecided
Unassigned
Precise
Invalid
High
Clint Byrum
mysql-5.5 (Ubuntu)
Fix Released
High
Clint Byrum
Oneiric
Invalid
Undecided
Unassigned
Precise
Fix Released
High
Clint Byrum
sysvinit (Ubuntu)
Fix Released
High
Clint Byrum
Oneiric
Fix Released
Undecided
Clint Byrum
Precise
Fix Released
High
Clint Byrum

Bug Description

== SRU JUSTIFICATION ==

IMPACT: potential data loss or extension of downtime. MySQL, for example, if sent a SIGKILL before it is done flushing its buffers into MyISAM tables, will lose that data. If using InnoDB, the transactions will have to be replayed from the transaction log at startup, which can take far longer than completing the flush procedure which the 300 second kill timeout in its job file allows for.

TEST CASE:

1. create a script, /usr/local/bin/15seconds.py with this as the content:

##### BEGIN COPY/PASTE #####
#!/usr/bin/python

import time
import signal
import logging
import sys

logging.basicConfig(level=logging.INFO,format="TEST: %(asctime)s: %(message)s")

def shutdown_process(sig, frame):
 logging.info("sleeping 15 seconds...")
 time.sleep(15)
 logging.info("now exitting...")
 sys.exit(0)

signal.signal(signal.SIGTERM, shutdown_process)

logging.info("Entering infinite loop")
while True:
 time.sleep(1)
##### END COPY/PASTE #####

chmod +x /usr/local/bin/15seconds.py

2. Create an upstart job file, /etc/init/15sec.conf to run this:

##### BEGIN COPY/PASTE #####
start on runlevel [2345]
stop on runlevel [016]

respawn

kill timeout 17

console output

exec /usr/local/bin/15seconds.py
##### END COPY/PASTE #####

3. sudo initctl start 15sec
4. sudo shutdown -h now

On an affected system, the job will be sent SIGKILL before the 15 second kill timeout, so your shutdown log will look something like this:

Checking for running unattended-upgrades:
TEST: 2011-12-13 01:02:54,638: sleeping 15 seconds...
 * Asking all remaining processes to terminate... TEST: 2011-12-13 01:02:54,818: now exitting...
TEST: 2011-12-13 01:02:54,819: sleeping 15 seconds...
                                                                         [ OK ]
 * Killing all remaining processes... [fail]
 * Deconfiguring network interfaces... [ OK ]
 * Deactivating swap... [ OK ]
 * Will now halt
[ 68.020383] System halted.

An unaffected system will look like this:

Checking for running unattended-upgrades:
TEST: 2011-12-13 00:52:30,476: sleeping 15 seconds...
 * Asking all remaining processes to terminate... [ OK ]
TEST: 2011-12-13 00:52:45,497: now exitting...
 * All processes ended within 16 seconds.... [ OK ]
 * Deconfiguring network interfaces... [ OK ]
 * Deactivating swap... [ OK ]
 * Will now halt
[ 356.481556] System halted.

Note that the 15sec job is waited on once the bug is fixed, where in the unpatched version it is killed immediately.

DEV FIX: The sendsigs script has not been changed in precise other than for this patch.

REGRESSION POTENTIAL: There may be scenarios and jobs that have very high kill timeouts which will cause system shutdowns to wait for up to 300 seconds instead of the pervious 10. This is considered a good balance between waiting long enough for any reasonable application to flush its buffers and short enough that we won't run up against any battery backup systems running out of battery power.

======

I'm using mysql-server-5.1 on a 10.04 LTS installation.
The mysql db is around 27GB and on a separate partition mounted as /var/lib/mysql.

On shutdown I get the following error message:

Checking for running unattended-upgrades: * Asking all remaining processes to terminate...
[80G
[74G[ OK ]
 * All processes ended within 1 seconds....
[80G
[74G[ OK ]
 * Deconfiguring network interfaces...
[80G
[74G[ OK ]
 * Deactivating swap...
[80G
[74G[ OK ]
 * Unmounting local filesystems...
[80G umount2: Device or resource busy
umount: /var/lib/mysql: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
umount2: Device or resource busy
umount: /tmp: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
[74G[
[31mfail
[39;49m]
mount: / is busy
 * Will now restart
[ 3369.429751] Restarting system.

On the next reboot the file system is corrupt and need to be fsck-ed.

I think the problem is, that mysql uses an upstart job (/etc/init/mysql.conf) and has
stop on runlevel [016]

The rc.conf job is also triggered on runlevel 0 and 6, so they basically run at the same time.As

When /etc/rc0.d/S20sendsigs is run, it deliberatly does not wait or kill any upstart jobs.

As my mysqld process takes some time to shutdown, S40umountfs and S60umountroot are run before the mysqld has quit.

Leading to the fs not being properly unmounted. It is event possible that mysqld is forcefully killed by halt in S90halt if it hasn't stopped by then.

This is a serious issue, as it can (and will) lead to data loss.

Other upstart jobs, like rsyslog.conf, use the same "stop on runlevel [016]" stanza, so they are probably affected too.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: mysql-server-5.1 5.1.49-1ubuntu8.1
Uname: Linux 2.6.32-5-686 i686
NonfreeKernelModules: michael_mic arc4 ecb lib80211_crypt_tkip aes_i586 aes_generic lib80211_crypt_ccmp sco bnep rfcomm l2cap binfmt_misc acpi_cpufreq ppdev lp cpufreq_userspace cpufreq_stats vboxnetadp cpufreq_powersave vboxnetflt cpufreq_conservative vboxdrv fuse pcmcia snd_intel8x0m snd_intel8x0 snd_ac97_codec btusb bluetooth rfkill ac97_bus yenta_socket ipw2200 snd_pcm 8139too firewire_ohci snd_seq 8139cp firewire_core sg uhci_hcd snd_timer rsrc_nonstatic libipw snd_seq_device pcmcia_core crc_itu_t parport_pc smsc_ircc2 ehci_hcd mii joydev lib80211 sr_mod parport i2c_i801 irda snd usbcore wbsd soundcore shpchp mmc_core pcspkr rng_core cdrom psmouse container crc_ccitt snd_page_alloc serio_raw pci_hotplug ac battery nls_base processor evdev ppp_generic slhc loop autofs4 ext4 mbcache jbd2 crc16 dm_mod sd_mod crc_t10dif radeon ttm ata_generic drm_kms_helper ata_piix drm i2c_algo_bit libata video thermal i2c_core scsi_mod output thermal_sys button
Architecture: i386
Date: Fri Dec 10 13:41:52 2010
ProcEnviron:
 PATH=(custom, no user)
 LANG=de_DE.utf8
 SHELL=/bin/bash
SourcePackage: mysql-5.1

Revision history for this message
Michael Biebl (mbiebl) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

What would be the general approach to express "shut down on runlevel 0/1/6 before the disks go away" in terms of upstart triggers? Once there's an approach, pleaes hand over to canonical-server. Thanks!

Changed in mysql-5.1 (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
status: New → Triaged
importance: Undecided → High
Revision history for this message
Ante Karamatić (ivoks) wrote :

Suggestion: make umountfs wait for all upstart jobs to finish.

Revision history for this message
Michael Biebl (mbiebl) wrote : Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010/12/10 Ante Karamatić <email address hidden>:
> Suggestion: make umountfs wait for all upstart jobs to finish.

Doesn't that conflict though with what is written in /etc/init.d/sendsigs:

        # Upstart jobs have their own "stop on" clauses that sends
        # SIGTERM/SIGKILL just like this, so if they're still running,
        # they're supposed to be
        for pid in $(initctl list | sed -n -e "/process
[0-9]/s/.*process //p"); do
                OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid"
        done

or

                # did an upstart job start since we last polled initctl? check
                # again on each loop and add any new jobs (e.g., plymouth) to
                # the list. If we did miss one starting up, this beats waiting
                # 10 seconds before shutting down.
                for pid in $(initctl list | sed -n -e "/process
[0-9]/s/.*process //p"); do
                        OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid"
                done

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Ante, good eyes there. That statement is a little misleading, that "if they're still running, they're supposed to be", as this assumes there was an event somewhere between the running system, and runlevel [016], which to my knowledge, there isn't.

I'm a little confused as to why umountfs is still running as part of rc, and not in an upstart job. I'm pretty sure actually, that this is related to bug #616287 , which I originally thought was mountall's fault, but now it seems is in fact sysvinit's.

In any case, one solution could be to have umountfs emit 'unmounting-filesystems' before it starts, and then change 'stop on runlevel [016]' to 'stop on unmounting-filesystems'. If I understand initctl correctly, it will wait for all of the triggered stops to complete before continuing.

I also think that we should look at abstracting the events a bit more for generic services so job writers don't have to become boot experts to know when to start on / stop on.

Adding sysvinit task as well.

Revision history for this message
Robbie Williamson (robbiew) wrote :

James, could you take a look at this?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Note that there is one incorrect assumption, which is that sendsigs will never kill any upstart jobs.

In fact, it does make one attempt to kill -9 any still running upstart jobs:

    if [ -z "$alldead" ] ; then
        log_action_begin_msg "Killing all remaining processes"
        #report_unkillable
        killall5 -9 $OMITPIDS # SIGKILL
        log_action_end_msg 1

Unfortunately, it doesn't actually wait for this kill -9 to finish, so its still possible to have running processes there corrupting the system.

I do think the appropriate fix is to have umountfs emit an 'unmounting-filesystems' event and anything that does a 'start on local-filesystems' or 'start on filesystem' should also 'stop on unmounting-filesystems', causing this to wait for upstart to give up on its jobs (which is nice as they can have their own well defined kill timeout). What I don't know yet, is whether upstart will check to see that its SIGKILL actually ended the job, or just report that it sent it, and move on.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

And, whoops, I just re-read that, its using killall5's -o to still omit those processes.

Please disregard that last message then.

Revision history for this message
Michael Biebl (mbiebl) wrote :

2010/12/14 Clint Byrum <email address hidden>:
>
> I do think the appropriate fix is to have umountfs emit an 'unmounting-
> filesystems' event and anything that does a 'start on local-filesystems'
> or 'start on filesystem' should also 'stop on unmounting-filesystems',

What do you do about services which have
"start on runlevel [2345]" and the binary is in /usr?

There are quite a few examples here: acpid, atd, cron, irqbalance, etc
which all have:

start on runlevel [2345]
stop on runlevel [!2345]

Either those jobs are buggy to not specify the "start on
(local-)filesystems" dependency or your criteria is not sufficient.

Imho the major problem here is, that there is a mixup between
dependencies that need to be satisfied to be able to run a job and
when (in which runlevels) to start a job.

Michael

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Great point, thanks for pointing that out!

rc-sysinit does not start until filesystem and net-device-up IFACE=lo, and so, runlevel 2, which is reached by callint rc-sysinit, implies all of the services you mention. It is important to point out that we must include any of those *implied* to be started up by filesystem or local-filesystems.

Before I go off and throw one together, I wonder if there is a tool that reads through /etc/init/*.conf and would simulate each event and the resulting chaos^H^H^H^H^Hstarted jobs? Such a thing would be massively useful.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hmm, I am wondering now if this bug is the same thing.

https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/616287

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So I've done some more thinking about this, and I had a bit of an aha! moment.

While we *should* in fact stop using 'stop on runlevel [016]' or 'stop on runlevel [!2345]', I think we can solve this without touching all of those jobs.

/etc/init.d/sendsigs has this code:

        # Upstart jobs have their own "stop on" clauses that sends
        # SIGTERM/SIGKILL just like this, so if they're still running,
        # they're supposed to be
        for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do
                OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid"
        done

It uses this to determine which pids not to kill because, presumably, upstart should be managing them.

However, this code is flawed. killall5 will kill the children of all of these if they are multi process daemons or scripts running things. This would only be solved by walking through /proc looking for these as parent pids (and then doing the same again with the new list.. ).

However, this technique can actually be used to determine if there are still jobs that are supposed to be stopped, but haven't finished stopping yet. Since they should be listed as stop/(pre-stop|post-stop|killed), we can determine exactly which pids we expect to go away. Since upstart has its own idea of how long to wait before it kills these, we should actually wait indefinitely.

I'm attaching a debdiff that solves the race as far as I can tell, though I think it needs a good long look, since it could mean shutdowns hang for a long time waiting (I'm especially curious if the pre-stop/post-stop's are subject to kill timeout)

Revision history for this message
Clint Byrum (clint-fewbar) wrote :
Changed in sysvinit (Ubuntu):
status: New → Triaged
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Also attaching the bash script that I used to test this, which simulates a process taking a long time on SIGTERM without forking.. it *should* work with sleep too, given the sendsigs change I posted, but when that change is not there.. sendsigs kills the sleeps and ruins all the fun.

Below is the upstart job I used to run it. I tested this on lucid, 10.04.1, and without the sendsigs change, the script would continue to run right up to the umounts and beyond despite having been "stopped". With the sendsigs change to wait, the test script would be sent SIGKILL well before the end of the halt.

start on filesystem and net-device-up
stop on runlevel [016]

console output

kill timeout 20

exec /home/clint/test_dies_slowly.bash

Revision history for this message
Michael Biebl (mbiebl) wrote :

2010/12/16 Clint Byrum <email address hidden>:
>
> /etc/init.d/sendsigs has this code:
>
>
>        # Upstart jobs have their own "stop on" clauses that sends
>        # SIGTERM/SIGKILL just like this, so if they're still running,
>        # they're supposed to be
>        for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do
>                OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid"
>        done
>
>
> It uses this to determine which pids not to kill because, presumably, upstart should be managing them.
>
> However, this code is flawed. killall5 will kill the children of all of
> these if they are multi process daemons or scripts running things.

This observation is correct. On the other hand, isn't this exactly
what the sendsigs script is for: clean up any remaining, stray
processes which have not been stopped by its corresponding sysv init
script or upstart job (or have been e.g. started by the user)?

But I guess you are right, we should first stop all upstart jobs, give
them time to finish stopping, and then let sendsigs clean up anything
remaining afterwards.

> However, this technique can actually be used to determine if there are
> still jobs that are supposed to be stopped, but haven't finished
> stopping yet. Since they should be listed as stop/(pre-stop|post-
> stop|killed), we can determine exactly which pids we expect to go away.
> Since upstart has its own idea of how long to wait before it kills
> these, we should actually wait indefinitely.
>
> I'm attaching a debdiff that solves the race as far as I can tell,
> though I think it needs a good long look, since it could mean shutdowns
> hang for a long time waiting (I'm especially curious if the pre-stop
> /post-stop's are subject to kill timeout)

This code is still racy, afaics. What about upstart jobs, which are
not stopped by "stop on runlevel [016]"? They could receive their stop
signal at a point when your loop has already been run.

If you don't want to change existing jobs, we probably have to pick up
Ante's suggestion, and do the following in sendsigs:

1) run a for loop to wait for *all* running upstart jobs to stop.
upstart jobs which need to keep running past sendsigs (e.g. plymouth)
need to signal that using a similar mechanism like the killall5
sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
to stop, so big databases etc have enough time to cleanly shutdown
2.) run a for loop and send SIGTERM all remaining processes, but do
*not* add upstart pids to $OMITPIDS
3.) send a final SIGKILL if any processes are left.

Regarding 1.), it would be nice to have a native C implementation in
upstart, instead of running initctl, grep and sleep manually.

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

tags: added: patch
Revision history for this message
Clint Byrum (clint-fewbar) wrote :
Download full text (3.5 KiB)

On Thu, 2010-12-16 at 15:45 +0000, Michael Biebl wrote:
> 2010/12/16 Clint Byrum <email address hidden>:
> >
> >
> > I'm attaching a debdiff that solves the race as far as I can tell,
> > though I think it needs a good long look, since it could mean shutdowns
> > hang for a long time waiting (I'm especially curious if the pre-stop
> > /post-stop's are subject to kill timeout)
>
> This code is still racy, afaics. What about upstart jobs, which are
> not stopped by "stop on runlevel [016]"? They could receive their stop
> signal at a point when your loop has already been run.
>

Indeed, there is still a race I think now that I dig through upstart's
code a bit. If any of the jobs in the stop/!waiting state have 'stop on
stopped' jobs that will be stopped after they stop, the event isn't
emitted until *after* the transition to stop/waiting.

thread A (upstart job foo):

start/running -> stop/pre-stop
sends TERM to owned process
stop/pre-stop -> stop/killed
process dies
stop/killed -> stop/waiting
emit stopped JOB=foo

thread B (upstart job baz)
start/running -> stop/pre-stop
sends kill to owned process
stop/pre-stop -> stop/killed
process dies
stop/killed -> stop/waiting

thread C (sleep loop)

runs initctl list
greps
sleeps
runs initctl list
greps
sleeps

list is handled by doing a "get all jobs" command first, and then
individual status commands for each job, so its entirely possible that
we will ask for the status of baz and it will say start/running, and
then foo finishes its transition, then we ask for foo's status and it is
stop/waiting, and we think we're done.

This race would probably be solved by having a "list all jobs with
status" command, as long as the stopped event is guaranteed to be
consumed before any commands, which, I believe it will.

One delicate issue is that if an upstart managed process dies for any
other reason than being stopped, upstart will try to respawn it, so we
can't just go sending SIGTERM/SIGKILL to all pids, as upstart will fight
us on those. We actually have to stop everything.

> If you don't want to change existing jobs, we probably have to pick up
> Ante's suggestion, and do the following in sendsigs:
>
> 1) run a for loop to wait for *all* running upstart jobs to stop.
> upstart jobs which need to keep running past sendsigs (e.g. plymouth)
> need to signal that using a similar mechanism like the killall5
> sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
> to stop, so big databases etc have enough time to cleanly shutdown

IMO, leaving out a valid stop on that gets it stopped at or before
runlevel [016] is the equivilent of the omit interface. You've started
it, saying exactly when upstart should or should not stop it. However,
if you've wandered into the scenario mentioned above with stop on
stopped foo, then we need to handle that.

> 2.) run a for loop and send SIGTERM all remaining processes, but do
> *not* add upstart pids to $OMITPIDS

See above, you'd have to send 'stop' commands to upstart for them,
instead of omitting them.

> 3.) send a final SIGKILL if any processes are left.
>

I'd say "let upstart do that".. but how do we know when we can continue
on to unmounting? I supp...

Read more...

Revision history for this message
James Hunt (jamesodhunt) wrote :

After discussion with Scott, the best short-term solution would seem to be:

1) Modify /etc/init.d/umountfs to call the following in do_stop before calling umount/swapoff:

     "initctl emit unmount-filesystem"

2) Modify /etc/init.d/umountroot to call the following in do_stop before calling umount:

     "initctl emit unmount-root-filesystem"

3) Modify all upstart configs for services which are "slow" to stop such that they "stop on unmount-filesystem",
    rather than "stop on runlevel [016]".

4) Test!

The overall effect of this being that when /etc/init.d/umountfs emits the unmount-filesystem event, it will block until any Upstart jobs which "stop on" those events have completed. Thus, /etc/init.d/umountfs will wait for the mysql Upstart job to finish before unmounting its filesystems.

Revision history for this message
Michael Biebl (mbiebl) wrote :

2010/12/20 James Hunt <email address hidden>:
>
> 3) Modify all upstart configs for services which are "slow" to stop such that they "stop on unmount-filesystem",
>    rather than "stop on runlevel [016]".

- What about single user mode? I guess when switching to runlevel 1 we
want to stop services like mysql?
- How do you decide if a service is '"slow" to stop' ? Imho that
highly depends on the given hardware, local configuration and the
amount of data you are dealing with. A general approach would be
preferable.

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Mon, 2010-12-20 at 12:50 +0000, James Hunt wrote:
> After discussion with Scott, the best short-term solution would seem to
> be:
>
> 1) Modify /etc/init.d/umountfs to call the following in do_stop before
> calling umount/swapoff:
>
> "initctl emit unmount-filesystem"
>
> 2) Modify /etc/init.d/umountroot to call the following in do_stop before
> calling umount:
>
> "initctl emit unmount-root-filesystem"
>
>
> 3) Modify all upstart configs for services which are "slow" to stop such that they "stop on unmount-filesystem",
> rather than "stop on runlevel [016]".
>
> 4) Test!
>
> The overall effect of this being that when /etc/init.d/umountfs emits
> the unmount-filesystem event, it will block until any Upstart jobs which
> "stop on" those events have completed. Thus, /etc/init.d/umountfs will
> wait for the mysql Upstart job to finish before unmounting its
> filesystems.

Not much happens between rc-sysinit starting and sendsigs/umountfs. Is
slow even 1 second between SIGTERM and exiting? Shouldn't we just make
sure everything that is 'stop on runlevel [!2345]' or 'stop on runlevel
[016]' stops before we umount? bug #672177 may very well be caused
simply by killing the last service that had the deleted libc.so.6 open,
causing the fs to need to finish the deletion right then, which could be
waiting on a sync and many other files being flushed/etc. on a busy
rotational disk. This will cause something very tiny to take a second to
die.

I think we must transition *everything* that stops on runlevel [016] to
'stop on unmounting-filesystems', or get clever and find a way to wait
until upstart is done stopping everything it already wants to stop. I do
think that initctl list is flawed for this task, but it might be the
best chance at catching stragglers that we have.

In a message to ubuntu-devel I suggested that we have an abstract job,
'network-services', which most normal (non boot-critical) services
should follow.

https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html

By taking this approach, we can at least ammend this fix if it has
unintended consequences.

There's also still the issue (which probably should be its own bug
report) that sendsigs will kill the children of already stopping jobs,
which it shouldn't do, and which it would still do in the suggested fix
since sendsigs runs before umountfs.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2010-12-21 at 12:41 +0000, Scott James Remnant wrote:
> On 20/12/10 18:22, Clint Byrum wrote:
> > In a message to ubuntu-devel I suggested that we have an abstract job,
> > 'network-services', which most normal (non boot-critical) services
> > should follow.
> >
> > https://lists.ubuntu.com/archives/ubuntu-devel/2010-December/032254.html
> >
> General note: ubuntu-devel is *NOT* the correct list to discuss Upstart
> changes unless they're unique to Ubuntu.
>

Thanks, Scott

In this case, I don't know if this would be unique to Ubuntu or not. I
am not suggesting a code change in upstart with that message, but rather
a change in the way upstart is used and packaged in Ubuntu. Though, it
would be rather nice if everybody used upstart the same way.

Revision history for this message
James Hunt (jamesodhunt) wrote :

@Michael: yes, this should be "stop on unmount-filesystem or single-user" (we can create a new event for single-user to make the logic clearer).

@Clint: I agree that full migration sounds like the best approach. I have had a few discussions previously with Scott on the idea of abstract jobs. There is quite a lot of scope here. Aside from network-services, we could introduce jobs such as:

- "network-manager" (not the application, could also refer to connman, wicd, etc).
- "firewall" (iptables, ufw, etc).
- "display-manager" (gdm, kdm, xdm, etc)
- "ssh" (openssh, dropbear)

Revision history for this message
ingo (ingo-steiner) wrote :

On Tue, 2010-12-21 at 12:41 +0000, Scott James Remnant wrote:
> General note: ubuntu-devel is *NOT* the correct list to discuss Upstart
> changes unless they're unique to Ubuntu.

Wouldn't it be fair to inform Debian about those problems before they release Squeeze?
(tough I never observed it on Squeeze till now)

Revision history for this message
Michael Biebl (mbiebl) wrote :

2010/12/21 ingo <email address hidden>:
> On Tue, 2010-12-21 at 12:41 +0000, Scott James Remnant wrote:
>> General note: ubuntu-devel is *NOT* the correct list to discuss Upstart
>> changes unless they're unique to Ubuntu.
>
> Wouldn't it be fair to inform Debian about those problems before they release Squeeze?
> (tough I never observed it on Squeeze till now)

This doesn't affect Debian as the upstart package in Debian still uses
plain sysv compat and there are no native upstart jobs yet.

Michael
(upstart maintainer in Debian)

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Revision history for this message
ingo (ingo-steiner) wrote :

> This doesn't affect Debian as the upstart package in Debian still uses
> plain sysv compat and there are no native upstart jobs yet.

A wise decision, good to know.
Thanks,
Ingo

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I was working with someone on another issue recently, and he pointed out a situation where someone had used this:

start on starting rc RUNLEVEL=[06]

to run a specific task before the system shut down.

It got me thinking, should we instead just transition services that need to start before shutdown to

stop on starting rc RUNLEVEL=[016]

That would cause these jobs to stop fully before any of the bits of the shutdown run. They'd still shutdown in parallel, so it wouldn't make the shutdown slower.

I do think you have to do *all* services like this. Even one left holding deleted libraries open can still ruin the shutdown process.

Anyway, I like this even shorter term solution because it allows us to SRU individual problem daemons such as mysql without creating a new event.

Revision history for this message
ingo (ingo-steiner) wrote :

@Clint:

I did test your proposal in Maverick.

Before editing the stop scripts:

fgrep "stop on runlevel" /etc/init/*.conf
/etc/init/acpid.conf:stop on runlevel [!2345]
/etc/init/anacron.conf:stop on runlevel [!2345]
/etc/init/apport.conf:stop on runlevel [!2345]
/etc/init/atd.conf:stop on runlevel [!2345]
/etc/init/cron.conf:stop on runlevel [!2345]
/etc/init/cups.conf:stop on runlevel [016]
/etc/init/dbus.conf:stop on runlevel [06]
/etc/init/failsafe-x.conf:stop on runlevel [06]
/etc/init/gdm.conf:stop on runlevel [016]
/etc/init/irqbalance.conf:stop on runlevel [!2345]
/etc/init/mountall-shell.conf:stop on runlevel [06]
/etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
/etc/init/rcS.conf:stop on runlevel [!S]
/etc/init/rc-sysinit.conf:stop on runlevel
/etc/init/rsyslog.conf:stop on runlevel [06]
/etc/init/tty1.conf:stop on runlevel [!2345]
/etc/init/tty2.conf:stop on runlevel [!23]
/etc/init/tty3.conf:stop on runlevel [!23]
/etc/init/tty4.conf:stop on runlevel [!23]
/etc/init/tty5.conf:stop on runlevel [!23]
/etc/init/tty6.conf:stop on runlevel [!23]
/etc/init/udev.conf:stop on runlevel [06]
/etc/init/ufw.conf:stop on runlevel [!023456]

After editing the stop scripts:

fgrep "stop on starting" /etc/init/*.conf
/etc/init/cups.conf:stop on starting rc RUNLEVEL=[016]
/etc/init/dbus.conf:stop on starting rc RUNLEVEL=[06]
/etc/init/failsafe-x.conf:stop on starting rc RUNLEVEL=[06]
/etc/init/gdm.conf:stop on starting rc RUNLEVEL=[016]
/etc/init/mountall.conf:stop on starting rcS
/etc/init/mountall-shell.conf:stop on starting rc RUNLEVEL=[06]
/etc/init/rsyslog.conf:stop on starting rc RUNLEVEL=[06]
/etc/init/udev.conf:stop on starting rc RUNLEVEL=[06]

Then execute

apt-get install --reinstall libc6
and reboot:

I still get the 8 orphaned inodes as reported already.

Did I miss to change the other scrips as well like this?
 stop on runlevel [!2345] -> stop on stop on starting rc RUNLEVEL=[016]

Revision history for this message
Clint Byrum (clint-fewbar) wrote :
Download full text (5.9 KiB)

On Thu, 2010-12-23 at 12:40 +0000, ingo wrote:
> @Clint:
>
> I did test your proposal in Maverick.
>
> Before editing the stop scripts:
>
> fgrep "stop on runlevel" /etc/init/*.conf
> /etc/init/acpid.conf:stop on runlevel [!2345]
> /etc/init/anacron.conf:stop on runlevel [!2345]
> /etc/init/apport.conf:stop on runlevel [!2345]
> /etc/init/atd.conf:stop on runlevel [!2345]
> /etc/init/cron.conf:stop on runlevel [!2345]
> /etc/init/cups.conf:stop on runlevel [016]
> /etc/init/dbus.conf:stop on runlevel [06]
> /etc/init/failsafe-x.conf:stop on runlevel [06]
> /etc/init/gdm.conf:stop on runlevel [016]
> /etc/init/irqbalance.conf:stop on runlevel [!2345]
> /etc/init/mountall-shell.conf:stop on runlevel [06]
> /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
> /etc/init/rcS.conf:stop on runlevel [!S]
> /etc/init/rc-sysinit.conf:stop on runlevel
> /etc/init/rsyslog.conf:stop on runlevel [06]
> /etc/init/tty1.conf:stop on runlevel [!2345]
> /etc/init/tty2.conf:stop on runlevel [!23]
> /etc/init/tty3.conf:stop on runlevel [!23]
> /etc/init/tty4.conf:stop on runlevel [!23]
> /etc/init/tty5.conf:stop on runlevel [!23]
> /etc/init/tty6.conf:stop on runlevel [!23]
> /etc/init/udev.conf:stop on runlevel [06]
> /etc/init/ufw.conf:stop on runlevel [!023456]
>
> After editing the stop scripts:
>
> fgrep "stop on starting" /etc/init/*.conf
> /etc/init/cups.conf:stop on starting rc RUNLEVEL=[016]
> /etc/init/dbus.conf:stop on starting rc RUNLEVEL=[06]
> /etc/init/failsafe-x.conf:stop on starting rc RUNLEVEL=[06]
> /etc/init/gdm.conf:stop on starting rc RUNLEVEL=[016]
> /etc/init/mountall.conf:stop on starting rcS
> /etc/init/mountall-shell.conf:stop on starting rc RUNLEVEL=[06]
> /etc/init/rsyslog.conf:stop on starting rc RUNLEVEL=[06]
> /etc/init/udev.conf:stop on starting rc RUNLEVEL=[06]
>
> Then execute
>
> apt-get install --reinstall libc6
> and reboot:
>
> I still get the 8 orphaned inodes as reported already.
>
> Did I miss to change the other scrips as well like this?
> stop on runlevel [!2345] -> stop on stop on starting rc RUNLEVEL=[016]
>

Yes, and some of those are probably the most likely to have libc open.

If doing the same to all of the !2345's does not fix the corruption, can
you do:

apt-get install --reinstall libc6
lsof -n |grep deleted
initctl list

And paste or upload the output of that here?

> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/688541
>
> Title:
> race condition on shutdown (leads to corrupted fs)
>
> Status in “mysql-5.1” package in Ubuntu:
> Triaged
> Status in “sysvinit” package in Ubuntu:
> Triaged
>
> Bug description:
> I'm using mysql-server-5.1 on a 10.04 LTS installation.
> The mysql db is around 27GB and on a separate partition mounted as /var/lib/mysql.
>
> On shutdown I get the following error message:
>
> Checking for running unattended-upgrades: * Asking all remaining processes to terminate...
> [80G
> [74G[ OK ]
> * All processes ended within 1 seconds....
> [80G
> [74G[ OK ]
> * Deconfiguring network interfaces...
> [80G
> [74G[ OK ]
> * Deactivating swap...
> [80G
> [74G[ OK...

Read more...

Revision history for this message
ingo (ingo-steiner) wrote :

I took you literally and canged all [!2345], not the others:

The remaining now are:

fgrep "stop on runlevel" /etc/init/*.conf
/etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
/etc/init/rcS.conf:stop on runlevel [!S]
/etc/init/rc-sysinit.conf:stop on runlevel
/etc/init/tty2.conf:stop on runlevel [!23]
/etc/init/tty3.conf:stop on runlevel [!23]
/etc/init/tty4.conf:stop on runlevel [!23]
/etc/init/tty5.conf:stop on runlevel [!23]
/etc/init/tty6.conf:stop on runlevel [!23]
/etc/init/ufw.conf:stop on runlevel [!023456]

I still get the orphaned inodes. Shall I also convert the tty's?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Thu, 2010-12-23 at 22:07 +0000, ingo wrote:
> I took you literally and canged all [!2345], not the others:
>
> The remaining now are:
>
> fgrep "stop on runlevel" /etc/init/*.conf
> /etc/init/rc.conf:stop on runlevel [!$RUNLEVEL]
> /etc/init/rcS.conf:stop on runlevel [!S]
> /etc/init/rc-sysinit.conf:stop on runlevel
> /etc/init/tty2.conf:stop on runlevel [!23]
> /etc/init/tty3.conf:stop on runlevel [!23]
> /etc/init/tty4.conf:stop on runlevel [!23]
> /etc/init/tty5.conf:stop on runlevel [!23]
> /etc/init/tty6.conf:stop on runlevel [!23]
> /etc/init/ufw.conf:stop on runlevel [!023456]
>
> I still get the orphaned inodes. Shall I also convert the tty's?
>

You can, but I doubt they're the problem.

Can you paste the output of

lsof -n |grep deleted

After the reinstall?

Thanks.

Revision history for this message
ingo (ingo-steiner) wrote :

I first tried to grep within a x-term, but there I get faulty output regarding gvfs. I suppose it's not needed.
So I booted into a maintainence root-shell (without network) and did:

lsof -n | grep deleted -> nothing reported

apt-get install --reinstall libc6
 and afterwards
lsof -n | grep deleted -> nothing reported

Rebooting brings up the 8 orphaned inodes.

May I conclude that reinstall of libc6 package performes correctly and file-system corruption is caused by shutdown process?

I am prepared to do more tests, just advise (I am not an expert) and consider local time, I am living east of Greenwich.
Merry Christmas, Ingo

Revision history for this message
ingo (ingo-steiner) wrote :

And here the output of

initctl list

after reinstall of libc6. It is amaizeing, that even though I selected "root shell without network" in the maintainence system, a lot of services including network is up and running (I used scp to copy the output to my PC).

Revision history for this message
ingo (ingo-steiner) wrote :

I do unsubscribe from this bug for the time beeing. It does not make sense to deal with the symtoms until the root of the evil Bug #672177 is fixed.

Changed in sysvinit (Ubuntu):
importance: Undecided → High
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So, I believe the right way to handle this is to wait a long time for any upstart job that has status of 'stop/killed'.

We can't be finding all of the "services that are slow to shutdown" one by one. Authors of upstart jobs will know how long to wait before sending kill -9. Once kill -9 has been sent, the job's state actually changes to post-stop, so sendsigs wouldn't wait any longer anyway, but we should cap it at something longer than 10 seconds. I would suggest 5 minutes.

Anyway, because of this, I don't think we should just fix this in mysql, we should fix it in sysvinit. However until its fixed in sysvinit, I'll change mysql's stop on to be 'stop on starting rc RUNLEVEL...'

Changed in mysql-5.1 (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → Clint Byrum (clint-fewbar)
status: Triaged → In Progress
Changed in mysql-5.5 (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Clint Byrum (clint-fewbar)
Changed in mysql-5.1 (Ubuntu):
status: In Progress → Triaged
Revision history for this message
Steve Langasek (vorlon) wrote :

I agree that this should be fixed in sysvinit. Is it really appropriate to change mysql-5.5 at all? It should be a straightforward change to sysvinit, and the mysql change should be reverted afterwards.

tags: added: rls-p-tracking
Changed in sysvinit (Ubuntu Precise):
milestone: none → ubuntu-12.04
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I think ultimately no, mysql-5.5 shouldn't be changed for this, and agreed the change should be straight forward. I was not certain if we were willing to tackle the sysvinit change in precise.. but on second thought, of course we should be. So I'll hold back on the change to mysql-5.5 and take a look at sysvinit.

Changed in mysql-5.5 (Ubuntu Precise):
status: In Progress → Invalid
Changed in mysql-5.1 (Ubuntu Precise):
status: Triaged → Invalid
Changed in sysvinit (Ubuntu Precise):
status: Triaged → In Progress
assignee: Canonical Foundations Team (canonical-foundations) → Clint Byrum (clint-fewbar)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.88dsf-13.10ubuntu8

---------------
sysvinit (2.88dsf-13.10ubuntu8) precise; urgency=low

  * d/src/initscripts/etc/init.d/sendsigs: wait up to 300 extra
    seconds for upstart jobs that have been killed. They will be sent
    SIGKILL by upstart when their 'kill timeout' has been reached, so
    we should trust the job's author to give the service a reasonable
    amount of time to shut down. (LP: #688541)
  * also omit pids of stop/killed upstart jobs since we know they've
    been killed already.
  * d/src/initscripts/etc/init.d/umountroot: Check for init.upgraded
    file in /var/run before clearing out /var/run. (LP: #886439)
 -- Clint Byrum <email address hidden> Mon, 12 Dec 2011 16:16:37 -0800

Changed in sysvinit (Ubuntu Precise):
status: In Progress → Fix Released
Changed in mysql-5.1 (Ubuntu Oneiric):
status: New → Invalid
Changed in mysql-5.5 (Ubuntu Oneiric):
status: New → Invalid
Changed in sysvinit (Ubuntu Oneiric):
status: New → In Progress
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Fix is waiting in the oneiric-proposed queue.

description: updated
Changed in sysvinit (Ubuntu Oneiric):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted sysvinit into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Alan (8libra) wrote :

Tested the Python "test case" from the bug description using the initscripts in Oneiric proposed. Prior to patch, did see "Killing all remaining processes... fail" as described. After patch, saw "All processes ended within 16 seconds....". According to the test case, this is a successful fix.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Alan you are my stable release update hero this week. :) The update should land in about 5 days (minimum 7 in -proposed just in case we missed a major regression.)

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.88dsf-13.10ubuntu4.1

---------------
sysvinit (2.88dsf-13.10ubuntu4.1) oneiric-proposed; urgency=low

  * d/src/initscripts/etc/init.d/sendsigs: wait up to 300 extra
    seconds for upstart jobs that have been killed. They will be sent
    SIGKILL by upstart when their 'kill timeout' has been reached, so
    we should trust the job's author to give the service a reasonable
    amount of time to shut down. (LP: #688541)
  * also omit pids of stop/killed upstart jobs since we know they've
    been killed already.
  * d/src/initscripts/etc/init.d/umountroot: Check for init.upgraded
    file in /var/run before clearing out /var/run. (LP: #886439)
 -- Clint Byrum <email address hidden> Mon, 12 Dec 2011 16:08:10 -0800

Changed in sysvinit (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mysql-5.5 - 5.5.20-0ubuntu1

---------------
mysql-5.5 (5.5.20-0ubuntu1) precise; urgency=low

  * New upstream release.
  * d/mysql-server-5.5.mysql.upstart: Fix stop on to make sure mysql is
    fully stopped before shutdown commences. (LP: #688541) Also simplify
    start on as it is redundant.
  * d/control: Depend on upstart version which has apparmor profile load
    script to prevent failure on upgrade from lucid to precise.
    (LP: #907465)
  * d/apparmor-profile: need to allow /run since that is the true path
    of /var/run files. (LP: #917542)
  * d/control: mysql-server-5.5 has files in it that used to be owned
    by libmysqlclient-dev, so it must break/replace it. (LP: #912487)
  * d/rules, d/control: 5.5.20 Fixes segfault on tests with gcc 4.6,
    change compiler back to system default.
  * d/rules: Turn off embedded libedit/readline.(Closes: #659566)
 -- Clint Byrum <email address hidden> Tue, 14 Feb 2012 23:59:22 -0800

Changed in mysql-5.5 (Ubuntu Precise):
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.