Ubuntu
mysql-dfsg-5.1 package

MySQL upstart stop job does not cleanly shutdown mysql

Bug #620441 reported by Folke Lemaitre on 2010-08-19

This bug affects 1 person

	Status	Importance	Assigned to
upstart	Invalid	Undecided	Unassigned
mysql-dfsg-5.1 (Ubuntu)	Fix Released	High	Clint Byrum
Lucid	Fix Released	High	Clint Byrum
Maverick	Fix Released	High	Clint Byrum

Bug Description

Release: Ubuntu 10.04 LTS
Package Version: 5.1.41-3ubuntu12.6

Stopping a MySQL server can take some time, especially on a busy database. If mysqld gets killed while having open connections, then after the next start you will likely have a lot of tables corrupted.

The new upstart mechanism for starting/stopping mysql unfortunately behaves as follows on a "service mysql stop":
1. First it will send a TERM signal to mysqld which is perfectly fine as mysqld will do the same as if you would issue a mysqladmin shutdown command
2. If after 5 seconds the process is still not terminated, a KILL signal will be send. This is really a very big issue since on busy servers lots of myisam tables will be corrupted afterwards

Expected behaviour:
Send the term signal and wait for a certain amount of time (at least 1 minute). If still not stopped, then simply give an error about this to the user

== SRU REPORT ==

=== IMPACT ===

This has very little impact on a normal running system. It may cause shutdown to take up to 5 minutes, but that is an acceptable trade off as recovering a system with crashed tables can take much, much longer.

=== DEV FIX ===

Raising kill timeout to 300 seconds seems a good trade off between shutdown potentially taking forever, and trying not to corrupt tables.

=== PATCH ===

See merge proposal

=== TEST CASE: ===

This is a race condition, so it is hard to reproduce reliably. However, if you:

1. create a very large innodb table
2. import a lot of data (inserting more MB of data than your drives can write in 5 seconds should work)
3. run a very long select on the data in one thread (SELECT * FROM table WHERE column like '%X%';)
4. DELETE that data in another thread
5. issue 'stop mysql' immediately after the delete returns.
6. check to see that mysqld was sent SIGKILL in /var/log/daemon.log
7. start mysql -- at this point InnoDB recovery will need to be run

Once the fix is applied, the SIGKILL should never be sent.

=== REGRESSION POTENTIAL ===

The only regression potential is mentioned in the IMPACT section, that the shutdown of a server or the mysql service may take up to 5 minutes.

See original description

Tags:

Related branches

lp:~clint-fewbar/ubuntu/maverick/mysql-5.1/increase-killtimeout

Merged into lp:ubuntu/maverick/mysql-5.1

Mathias Gug: Pending requested 2010-09-08

lp:ubuntu/maverick/mysql-5.1

lp:~clint-fewbar/ubuntu/lucid/mysql-dfsg-5.1/increase-killtimeout

Merged into lp:ubuntu/lucid-updates/mysql-dfsg-5.1 at revision 33

Bryce Harrington: Approve (packaging) on 2010-12-01

Ubuntu branches: Pending requested 2010-11-20

lp:ubuntu/lucid-proposed/mysql-dfsg-5.1

Revision history for this message

Scott Moser (smoser) wrote on 2010-08-19:

Thank you for your complete bug report.

It would seem this can be fixed in the upstart job with a line like 'kill timeout 300'. From man 5 init:
  | kill timeout INTERVAL
  | Specifies the interval between sending the job's main process
  | the SIGTERM and SIGKILL signals when stopping the running job.

Changed in mysql-dfsg-5.1 (Ubuntu):
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-08-19:

Scott, I've experienced problems with SIGKILL's corrupting mysql databases before, so the workaround given might be a little misleading. MySQL can actually be stuck waiting on a single query to finish before it can shut down, and some queries might take *hours*.

That said, its a known tradeoff if you're using MyISAM that you may corrupt tables. The workaround is to do as suggested above, and also move to transactional storage such as InnoDB. With InnoDB the database integrity will survive a SIGKILL (though the next startup may take several hours in recovery if the system has large transaction logs).

5 minutes seems an appropriate length of time to wait. However, is there any way to cancel it?

Revision history for this message

Folke Lemaitre (folke-lemaitre) wrote on 2010-08-19:

Actually the SIGKILL is really always a no go unless a sysadmin decides to do this manually. An upstart script should never do this automatically (not even as a feature ;-) ) I can guarantee that in more than 90% of the cases after a SIGKILL you will have corrupted tables. With the default config of auto recover myisam tables and smaller sites having small databases, this is not such an issue (altough eventually you will have dataloss), but in larger setups with tables containing millions of records this is really a nightmare. In large scale setups where transactions are not necessary, it doesn't make sense to use INNODB instead and indeed, with innodb you would have the exact same issue.

Ideally, the upstart mechanism should have support for "kill timeout 0" meaning a KILL will never be send for that specific job. If the user still wants to kill, ideally he should be able to do "stop --force mysql".

Since upstart does not support this mechanism, I suggest to indeed add "kill timeout 900".

With the init.d system the mysql stop was handled a lot better. It simply sent the TERM signal and waited for 20 seconds. If after 20 seconds MySQL was still running, the stop returned with an error saying so and an exit code != 0

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-08-23:

upstart definitely needs a way to disable the SIGKILL functionality. setting

kill timeout 0

Just means instantly kill it, don't even wait for the SIGTERM. Definitely not the desired behavior.

So this is really a feature request for upstart more than anything.

To be clear, upstart should allow disabling sending kill -9 per-service (I understand that at system shutdown it will still need to kill -9).

I do think its reasonable to make the statement that if your data is important, you should put it in InnoDB, or maybe more importantly "not MyISAM". Complex transactionsal integrity is one reason. MyISAM is extremely prone to data loss on crashes, which I think is the more pressing reason to suggest that InnoDB be preferred in most cases. Certainly upstart isn't the only thing that might accidentally kill -9 your mysqld.

With InnoDB you *will not* have the exact same issue. You will not lose any data unless there is a grave bug in MySQL and/or InnoDB's transaction handling (most of which were hammered out years ago). You may, however, lose time, as you must replay transactions from the recovery log upon starting back up.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-08-24:

You can use a "pre-stop" script/exec to send the proper command to stop mysql and to wait for it to shut down.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-08-24:

Scott, thanks for the suggestion, I hadn't thought of anything like that.

Can we abort the stop from pre-stop somehow? Otherwise I'm afraid users will either be left waiting forever, or still sending SIGKILL.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-08-24: Re: [Bug 620441] Re: MySQL upstart stop job does not cleanly shutdown mysql

On Tue, 2010-08-24 at 16:23 +0000, Clint Byrum wrote:

> Can we abort the stop from pre-stop somehow? Otherwise I'm afraid users
> will either be left waiting forever, or still sending SIGKILL.
>
You can call "start" from the pre-stop script, which will return the job
to a running state.

That being said, Upstart will happily wait forever for pre-stop before
sending SIGKILL - so if the database takes a long time to stop, that's
ok

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-08-24:

On Aug 24, 2010, at 9:50 AM, Scott James Remnant wrote:

> On Tue, 2010-08-24 at 16:23 +0000, Clint Byrum wrote:
>
>> Can we abort the stop from pre-stop somehow? Otherwise I'm afraid users
>> will either be left waiting forever, or still sending SIGKILL.
>>
> You can call "start" from the pre-stop script, which will return the job
> to a running state.
>

This sounds like a solution then... in pseudo code

while true
  send shutdown to mysql
  wait 5 seconds
  if mysql still running
    if runlevel not in (0,1,6)
      # manual stop, not reboot/halt
      if 30 seconds has passed
        report to user that mysql will probably die and restart
        start again # will respawn mysqld when it does die
    else
      log warning about waiting forever to reboot/halt..
  else
    break # service is dead

> That being said, Upstart will happily wait forever for pre-stop before
> sending SIGKILL - so if the database takes a long time to stop, that's
> ok

It would be ok, but I'm not sure I like it.

Would there be an indication anywhere for users that they're waiting on mysql to die?

Revision history for this message

Johan Kiviniemi (ion) wrote on 2010-08-24:

> Would there be an indication anywhere for users that they're waiting on mysql to die?

The ‘stop’ command having not returned yet, and ‘status jobname’ saying ‘stopping’.

If a bug in a service causes a stop request not to stop it, the service should be fixed.

As for the pseudocode, I don’t think respawning is expected behavior when the user has requested a job to stop.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-08-24:

#10

On Aug 24, 2010, at 1:22 PM, Johan Kiviniemi wrote:

>> Would there be an indication anywhere for users that they're waiting
> on mysql to die?
>
> The ‘stop’ command having not returned yet, and ‘status jobname’ saying
> ‘stopping’.
>
> If a bug in a service causes a stop request not to stop it, the service
> should be fixed.
>

I had made some assumptions about the shutdown process until now based on my past experience, but it is detailed here:

http://dev.mysql.com/doc/refman/5.1/en/server-shutdown.html

So the only thing that *should* cause a really long shutdown time is a giant transaction rollback.

> As for the pseudocode, I don’t think respawning is expected behavior
> when the user has requested a job to stop.

Agreed... I'm still worried about restarts during upgrades spinning forever though. I guess as long as the script tells the user "Waiting for mysql to shutdown.. ## threads still running.." every 30 seconds that would probably be enough.

Thierry Carrez (ttx) on 2010-08-26

Changed in mysql-dfsg-5.1 (Ubuntu):
assignee:	nobody → Clint Byrum (clint-fewbar)

papukaija (papukaija) on 2010-08-26

tags:

added: lucid

Thierry Carrez (ttx) on 2010-09-02

tags:

added: server-mrs

Scott James Remnant (Canonical) (canonical-scott) on 2010-09-02

Changed in upstart:
status:	New → Invalid

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-09-02:

#11

In doing research for this fix, I'm having trouble finding a set of best practices for upstart jobs in Ubuntu. If anyone has a link or list of them, please do post here, as I am a bit confused about how to interact with the user. console output seems a hacky way (user may not have console if ssh'ing in for instance). So, logger? Something else?

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-09-08:

#12

Ultimately, I believe what Johan stated previously is correct. If your mysql daemon is not stopping when you tell it to shutdown, then you should be concerned.

I've pushed a branch up now that raises kill timeout to 300 seconds.

5 minutes should be long enough for most databases to finish flushing. Any longer and the system reboot/halt will be delayed unacceptably. I'm thinking about what happens when a UPS dies and the sysadmin tries to shut down the server. kill -9 is still less violent than a total system power off!.

I'm also starting a discussion with our documentation team about recommending InnoDB as the safer table type, and I've opened a new bug report to suggest that InnoDB be the default table_type:

https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/633364

Clint Byrum (clint-fewbar) on 2010-09-08

Changed in mysql-dfsg-5.1 (Ubuntu Maverick):
status:	Triaged → Confirmed
assignee:	Clint Byrum (clint-fewbar) → nobody

Revision history for this message

Thierry Carrez (ttx) wrote on 2010-09-09:

#13

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-09-15:

#14

This is awaiting sponsorship, please see the merge proposal.

tags:

added: patch

Mathias Gug (mathiaz) on 2010-09-15

Changed in mysql-dfsg-5.1 (Ubuntu Maverick):
assignee:	nobody → Clint Byrum (clint-fewbar)
status:	Confirmed → In Progress

Revision history for this message

Mathias Gug (mathiaz) wrote on 2010-09-15:

#15

This bug should probably be backported to lucid as well. Accepting lucid nomination.

To move the SRU forward, the next step is to prepare a SRU report as outlined in https://wiki.ubuntu.com/StableReleaseUpdates#Procedure.

Changed in mysql-dfsg-5.1 (Ubuntu Maverick):
status:	In Progress → Fix Released
Changed in mysql-dfsg-5.1 (Ubuntu Lucid):
importance:	Undecided → High
status:	New → Incomplete
status:	Incomplete → Confirmed
status:	Confirmed → Triaged

Clint Byrum (clint-fewbar) on 2010-11-20

description:

updated

Revision history for this message

papukaija (papukaija) wrote on 2010-11-20:

#16

Should ubuntu-sru be subscribed to this bug report?

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-11-21:

#17

paukaija, thanks for the reminder, I just subscribed ubuntu-sru

Revision history for this message

Martin Pitt (pitti) wrote on 2010-11-22:

#18

SRU ack, please upload.

Revision history for this message

Bryce Harrington (bryce) wrote on 2010-12-01:

#19

Sponsored upload of SRU patch to lucid-proposed

Changed in mysql-dfsg-5.1 (Ubuntu Lucid):
assignee:	nobody → Clint Byrum (clint-fewbar)
status:	Triaged → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2010-12-03: Please test proposed package

#20

Accepted mysql-dfsg-5.1 into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags:

added: verification-needed

Revision history for this message

Imre Gergely (cemc) wrote on 2010-12-05:

#21

I tried it with a simplified test case with just DELETEing a lot of data from the table with 1.5mil rows.

The DELETE command took like a minute and a half to return. After it returned, I immediately did 'stop mysql' and saw the KILL message in the daemon.log after ~5 seconds.
After I started mysql again, I got the recovery messages in /var/log/mysql/error.log like this:

InnoDB: Log scan progressed past the checkpoint lsn 0 241337601
101205 23:35:25 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...

With the mysql package from -proposed (5.1.41-3ubuntu12.8) when I issued 'stop mysql' after DELETE, it was just sitting there, nothing in daemon.log, and in mysql/error.log I got

101205 23:51:25 [Note] Event Scheduler: Purging the queue. 0 events
101205 23:51:27 InnoDB: Starting shutdown...
101205 23:51:40 InnoDB: Shutdown completed; log sequence number 0 258830313
101205 23:51:40 [Note] /usr/sbin/mysqld: Shutdown complete

so it was waiting for InnoDB to finish whatever it was doing (that took 13 seconds) and then it stopped. After starting mysql up again there was obviously no recovery needed.

Seems to me the fix is working fine.

Martin Pitt (pitti) on 2010-12-06

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-12-09:

#22

This bug was fixed in the package mysql-dfsg-5.1 - 5.1.41-3ubuntu12.8

---------------
mysql-dfsg-5.1 (5.1.41-3ubuntu12.8) lucid-proposed; urgency=low

* raising kill timeout to 300 to help avoid table corruption (LP: #620441)
-- Clint Byrum <email address hidden> Sat, 20 Nov 2010 12:06:00 -0800

Changed in mysql-dfsg-5.1 (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Robert Citek (robert-citek) wrote on 2010-12-16:

#23

We got bitten by this bug and are now checking tables on our DBs. What's interesting is that we made changes to the mysql upstart script but they do not seem to be taking effect. In particular:

1) changes to the kill timeout. Changing the value to 300 had no effect. Even after issuing an "initctl reload-configuration", upstart still appeared to kill mysqld after about 5 seconds. We are still in the process of confirming.

2) commented out respawn. This also had no effect on "mysqladmin stop", which would cleanly stop mysql, but almost immediatly, mysqld would restart. Here's a sample:

$ ( set -x ; sudo mysqladmin status ; sudo mysqladmin shutdown ; sleep 10 ; sudo mysqladmin status )
+ sudo mysqladmin status
Uptime: 92 Threads: 1 Questions: 7882 Slow queries: 1 Opens: 71 Flush tables: 1 Open tables: 64 Queries per second avg: 85.673
+ sudo mysqladmin shutdown
+ sleep 10
+ sudo mysqladmin status
Uptime: 11 Threads: 1 Questions: 919 Slow queries: 1 Opens: 49 Flush tables: 1 Open tables: 42 Queries per second avg: 83.545

This is on Ubuntu Lucid. 10.04 running mysql-server-5.1 (5.1.41-3ubuntu12.6)

Revision history for this message

Robert Citek (robert-citek) wrote on 2010-12-16:

#24

Had some issues getting changes in mysql's upstart script from taking effect. This demonstrates the issue, using the respawn command as the example, and a solution.

$ grep respawn /etc/init/mysql.conf
respawn

$ sudo mysqladmin status ; sudo mysqladmin shutdown ; sudo mysqladmin status ; sleep 10 ; sudo mysqladmin status
Uptime: 29 Threads: 1 Questions: 100 Slow queries: 0 Opens: 99 Flush tables: 1 Open tables: 23 Queries per second avg: 3.448
Uptime: 1 Threads: 1 Questions: 1 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 8 Queries per second avg: 1.0
Uptime: 11 Threads: 1 Questions: 99 Slow queries: 0 Opens: 99 Flush tables: 1 Open tables: 23 Queries per second avg: 9.0

$ sudo sed -i.bak -e 's/respawn/# respawn/' /etc/init/mysql.conf

$ grep respawn /etc/init/mysql.conf
# respawn

$ sudo mysqladmin status ; sudo mysqladmin shutdown ; sudo mysqladmin status ; sleep 10 ; sudo mysqladmin status
Uptime: 5 Threads: 1 Questions: 98 Slow queries: 0 Opens: 99 Flush tables: 1 Open tables: 23 Queries per second avg: 19.600
Uptime: 1 Threads: 1 Questions: 1 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 8 Queries per second avg: 1.0
Uptime: 11 Threads: 1 Questions: 99 Slow queries: 0 Opens: 99 Flush tables: 1 Open tables: 23 Queries per second avg: 9.0

$ sudo mysqladmin shutdown ; sudo service mysql stop
mysql stop/waiting

$ sudo service mysql start
mysql start/running, process 10405

$ sudo mysqladmin status ; sudo mysqladmin shutdown ; sudo mysqladmin status ; sleep 10 ; sudo mysqladmin status
Uptime: 6 Threads: 1 Questions: 98 Slow queries: 0 Opens: 99 Flush tables: 1 Open tables: 23 Queries per second avg: 16.333
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

Had some issues getting changes in mysql's upstart script from taking effect.  This demonstrates the issue, using the respawn command as the example, and a solution.

$ grep respawn /etc/init/mysql.conf 
respawn

$ sudo mysqladmin status ; sudo mysqladmin shutdown ; sudo mysqladmin status ; sleep 10 ; sudo mysqladmin status
Uptime: 29  Threads: 1  Questions: 100  Slow queries: 0  Opens: 99  Flush tables: 1  Open tables: 23  Queries per second avg: 3.448
Uptime: 1  Threads: 1  Questions: 1  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 1.0
Uptime: 11  Threads: 1  Questions: 99  Slow queries: 0  Opens: 99  Flush tables: 1  Open tables: 23  Queries per second avg: 9.0

$ sudo sed -i.bak -e 's/respawn/# respawn/' /etc/init/mysql.conf

$ grep respawn /etc/init/mysql.conf 
# respawn

$ sudo mysqladmin status ; sudo mysqladmin shutdown ; sudo mysqladmin status ; sleep 10 ; sudo mysqladmin status
Uptime: 5  Threads: 1  Questions: 98  Slow queries: 0  Opens: 99  Flush tables: 1  Open tables: 23  Queries per second avg: 19.600
Uptime: 1  Threads: 1  Questions: 1  Slow queries: 0  Opens: 15  Flush tables: 1  Open tables: 8  Queries per second avg: 1.0
Uptime: 11  Threads: 1  Questions: 99  Slow queries: 0  Opens: 99  Flush tables: 1  Open tables: 23  Queries per second avg: 9.0

$ sudo mysqladmin shutdown ; sudo service mysql stop
mysql stop/waiting

$ sudo service mysql start
mysql start/running, process 10405

$ sudo mysqladmin status ; sudo mysqladmin shutdown ; sudo mysqladmin status ; sleep 10 ; sudo mysqladmin status
Uptime: 6  Threads: 1  Questions: 98  Slow queries: 0  Opens: 99  Flush tables: 1  Open tables: 23  Queries per second avg: 16.333
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

Ubuntu Foundations Team Bug Bot (crichton) on 2011-09-19

tags:

added: testcase

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntumysql-dfsg-5.1 package

MySQL upstart stop job does not cleanly shutdown mysql

Bug Description

Related branches

Other bug subscribers

Remote bug watches

Ubuntu
mysql-dfsg-5.1 package