Bug #326768 “mysqld_safe thinks mysqld has crashed when it hasn'...” : Bugs : mysql-dfsg-5.0 package : Debian

Revision history for this message

Charles (charles-bovy) wrote on 2009-03-10:

#1

I can confirm this behavior.

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-03-30:

#2

Thank you for taking the time to report this bug and helping to make Ubuntu better. When you're running into the issue, does mysqld_safe takes up 100% of the cpu?

Changed in mysql-dfsg-5.0:
status:	New → Incomplete

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-04-01:

#3

Without the fix I mentioned in the original report, mysql gets restarted once. Obviously mysqld_safe uses some CPU to do this (it will do one iteration of the loop at the bottom of the script). mysqld_safe is not looping and restarting mysql multiple times, which is, I think, the behaviour you were asking about.

No, it does not sit at the top of `top`.

It is strange that it only falls through once.

Revision history for this message

MarcRandolph (mrand) wrote on 2009-04-09:

#4

Bug 358173 may be a duplicate of this.

Thanks to the original reporter for proposing a solution.

Changed in mysql-dfsg-5.0 (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-04-11:

#5

Yeah - it looks to me like Bug 358173 is the same bug.

I find it very strange that this bug only affects mythbackend (all the possible duplicates seem to reference mythtv). I suspect that it is doing something strange. I reported a mysql bug because I figured that the mysql server shouldn't behave strangely even if the client program does weird things.

Revision history for this message

MarcRandolph (mrand) wrote on 2009-04-14:

#6

Adding mythbuntu. It seems to me that the " & wait" should make things more reliable and predictable rather than less, so I must be missing something. Any danger in removing the " & wait"?

Changed in mythbuntu:
importance:	Undecided → Critical
milestone:	none → jaunty-rc
status:	New → Confirmed

Revision history for this message

ranjo (ranjo-jjxl) wrote on 2009-04-18:

#7

Since I upgraded to Jaunty RC, my mythtv database gets corrupted after almost every new boot. This results into a confused mythtv-backend which fails to record the scheduled recordings. After a "repair table" command everything returns back as normal. I'm trying the " & wait" fix mentioned above, hope to get some good results.

Revision history for this message

barney_1 (szczys) wrote on 2009-04-19:

#8

I believe I am also encountering this bug. Running Mythbuntu 9.04 Beta I find my recording schedule on mythtv is empty after reboot. Running mythtv-setup, removing all encoder cards and adding them again fixes this.

Output of "cat syslog | grep mysqld":

Apr 19 11:54:05 barney mysqld[3497]: 090419 11:54:05 [Note] /usr/sbin/mysqld: Normal shutdown
Apr 19 11:54:05 barney mysqld[3497]:
Apr 19 11:54:07 barney mysqld[3497]: 090419 11:54:07 InnoDB: Starting shutdown...
Apr 19 11:54:08 barney mysqld[3497]: 090419 11:54:08 InnoDB: Shutdown completed; log sequence number 0 150426
Apr 19 11:54:08 barney mysqld[3497]: 090419 11:54:08 [Note] /usr/sbin/mysqld: Shutdown complete
Apr 19 11:54:08 barney mysqld[3497]:
Apr 19 11:54:08 barney mysqld_safe[5352]: ended
Apr 19 11:54:48 barney kernel: [ 11.333672] type=1505 audit(1240160086.411:8): operation="profile_load" name="/usr/sbin/mysqld" name2="default" pid=1969
Apr 19 11:54:49 barney mysqld_safe[2496]: started
Apr 19 11:54:49 barney mysqld[2500]: 090419 11:54:49 InnoDB: Started; log sequence number 0 150426
Apr 19 11:54:50 barney mysqld[2500]: 090419 11:54:50 [Note] /usr/sbin/mysqld: ready for connections.
Apr 19 11:54:50 barney mysqld[2500]: Version: '5.0.75-0ubuntu10' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)
Apr 19 11:55:03 barney mysqld_safe[3605]: Number of processes running now: 1
Apr 19 11:55:03 barney mysqld_safe[3613]: mysqld process hanging, pid 2498 - killed
Apr 19 11:55:03 barney mysqld_safe[3616]: restarted
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: The log sequence number in ibdata files does not match
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: the log sequence number in the ib_logfiles!
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 InnoDB: Database was not shut down normally!
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Starting crash recovery.
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Reading tablespace information from the .ibd files...
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Restoring possible half-written data pages from the doublewrite
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: buffer...
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 InnoDB: Started; log sequence number 0 150436
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 [Note] /usr/sbin/mysqld: ready for connections.
Apr 19 11:55:03 barney mysqld[3620]: Version: '5.0.75-0ubuntu10' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)

I believe I am also encountering this bug.  Running Mythbuntu 9.04 Beta I find my recording schedule on mythtv is empty after reboot.  Running mythtv-setup, removing all encoder cards and adding them again fixes this.

Output of "cat syslog | grep mysqld":

Apr 19 11:54:05 barney mysqld[3497]: 090419 11:54:05 [Note] /usr/sbin/mysqld: Normal shutdown
Apr 19 11:54:05 barney mysqld[3497]: 
Apr 19 11:54:07 barney mysqld[3497]: 090419 11:54:07  InnoDB: Starting shutdown...
Apr 19 11:54:08 barney mysqld[3497]: 090419 11:54:08  InnoDB: Shutdown completed; log sequence number 0 150426
Apr 19 11:54:08 barney mysqld[3497]: 090419 11:54:08 [Note] /usr/sbin/mysqld: Shutdown complete
Apr 19 11:54:08 barney mysqld[3497]: 
Apr 19 11:54:08 barney mysqld_safe[5352]: ended
Apr 19 11:54:48 barney kernel: [   11.333672] type=1505 audit(1240160086.411:8): operation="profile_load" name="/usr/sbin/mysqld" name2="default" pid=1969
Apr 19 11:54:49 barney mysqld_safe[2496]: started
Apr 19 11:54:49 barney mysqld[2500]: 090419 11:54:49  InnoDB: Started; log sequence number 0 150426
Apr 19 11:54:50 barney mysqld[2500]: 090419 11:54:50 [Note] /usr/sbin/mysqld: ready for connections.
Apr 19 11:54:50 barney mysqld[2500]: Version: '5.0.75-0ubuntu10'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)
Apr 19 11:55:03 barney mysqld_safe[3605]: Number of processes running now: 1
Apr 19 11:55:03 barney mysqld_safe[3613]: mysqld process hanging, pid 2498 - killed
Apr 19 11:55:03 barney mysqld_safe[3616]: restarted
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: The log sequence number in ibdata files does not match
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: the log sequence number in the ib_logfiles!
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03  InnoDB: Database was not shut down normally!
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Starting crash recovery.
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Reading tablespace information from the .ibd files...
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Restoring possible half-written data pages from the doublewrite
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: buffer...
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03  InnoDB: Started; log sequence number 0 150436
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 [Note] /usr/sbin/mysqld: ready for connections.
Apr 19 11:55:03 barney mysqld[3620]: Version: '5.0.75-0ubuntu10'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-04-20:

#9

Barney - could you clarify your comment for me?

You say "Running mythtv-setup, removing all encoder cards and adding them again fixes this." Is this a permanent fix, or does the problem recur if you reboot?

If this is only a fix for that boot, there are easier ones. Anything that causes mythbackend to force a reschedule is a temporary fix (this includes restarting mythbackend, or just marking a new show to be recorded).

Revision history for this message

barney_1 (szczys) wrote on 2009-04-21:

#10

Hi Will,

Thanks, you're right. Restarting mythtv-backend after reboot (sudo /etc/init.d/mythtv-backend restart) does bring the schedule back.

Is this bug what is causing my problem?

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-04-21:

#11

I thought I'd list what I discovered when originally looking at this bug. Much of this is mythtv specific. I'm currently of the opinion that there are issues in both mythtv and mysql, with the mythtv strange behaviour tickling a mysql issue.

On a problematic boot, mysql starts normally. This means that the mysqld_safe script is run. It starts the mysql daemon then waits for the daemon to stop. If the daemon stops normally (the pid file has been removed) then mysqld_safe exits. If the daemon didn't stop normally then mysqld_safe kills any remaining instances of mysqld running, before restarting the daemon.

mythbackend then starts. During startup, mythbackend does a reschedule of its recordings (i.e. it goes through the list of programs to be recorded and the epg and decides which ones to record with which tuner).

At some point around this time, mysqld_safe is detecting that the mysql daemon has stopped. It is then, as noted above, killing all remaining mysql daemon processes, before starting a fresh one. My understanding is that on my default desktop jaunty install I should only ever have one mysql daemon running at once (and a quick ps reveals that is what I currently have). If you look at the logs above, you'll see that mysqld_safe is detecting one mysql daemon running after it has already decided that one has failed. This is what I meant by the bug report title, "mysql thinks it has crashed when it hasn't". This seems to be a problem with a false detection of a crash causing the cleanup mechanism to actually kill the mysqld process.

When the mysqld process is killed in this way during a mythbackend reschedule, mythbackend fails to schedule any recordings. Once the database restarts, mythbackend can still access the database - e.g. you can see the epg, and you see the list of programs you've ask to be recorded. If you do anything that causes mythbackend to re-schedule recordings (e.g. change anything in the list of programs to be recorded - you can even change it back after the reschedule), then mythbackend will connect to the db, reschedule correctly and everything will be fine.

This is a little bit weird in that restarting mythbackend later doesn't cause mysql to restart again. It only seems to restart once. If there was a major problem with mythbackend then I would expect it to show up every time mythbackend is restarted.

It also doesn't seem to be a race condition (which was suggested by the fact that it only happens the first time). If I put a `sleep 10` at the start of the mythbackend init script it doesn't help.

I thought I'd list what I discovered when originally looking at this bug.  Much of this is mythtv specific.  I'm currently of the opinion that there are issues in both mythtv and mysql, with the mythtv strange behaviour tickling a mysql issue.

On a problematic boot, mysql starts normally.  This means that the mysqld_safe script is run.  It starts the mysql daemon then waits for the daemon to stop.  If the daemon stops normally (the pid file has been removed) then mysqld_safe exits.  If the daemon didn't stop normally then mysqld_safe kills any remaining instances of mysqld running, before restarting the daemon.

mythbackend then starts.  During startup, mythbackend does a reschedule of its recordings (i.e. it goes through the list of programs to be recorded and the epg and decides which ones to record with which tuner).

At some point around this time, mysqld_safe is detecting that the mysql daemon has stopped.  It is then, as noted above, killing all remaining mysql daemon processes, before starting a fresh one.  My understanding is that on my default desktop jaunty install I should only ever have one mysql daemon running at once (and a quick ps reveals that is what I currently have).  If you look at the logs above, you'll see that mysqld_safe is detecting one mysql daemon running after it has already decided that one has failed.  This is what I meant by the bug report title, "mysql thinks it has crashed when it hasn't".  This seems to be a problem with a false detection of a crash causing the cleanup mechanism to actually kill the mysqld process.

When the mysqld process is killed in this way during a mythbackend reschedule, mythbackend fails to schedule any recordings.  Once the database restarts, mythbackend can still access the database - e.g. you can see the epg, and you see the list of programs you've ask to be recorded.  If you do anything that causes mythbackend to re-schedule recordings (e.g. change anything in the list of programs to be recorded - you can even change it back after the reschedule), then mythbackend will connect to the db, reschedule correctly and everything will be fine.

This is a little bit weird in that restarting mythbackend later doesn't cause mysql to restart again.  It only seems to restart once.  If there was a major problem with mythbackend then I would expect it to show up every time mythbackend is restarted.

It also doesn't seem to be a race condition (which was suggested by the fact that it only happens the first time).  If I put a `sleep 10` at the start of the mythbackend init script it doesn't help.

summary:

- mysql thinks it has crashed when it hasn't
+ mysqld_safe thinks mysqld has crashed when it hasn't

Revision history for this message

barney_1 (szczys) wrote on 2009-05-01:

#12

I see that Mythbuntu 9.04 has been released and there is no mention of this problem in the release notes (http://www.mythbuntu.org/9.04/Release_notes).

I am still having this problem. After every reboot I have to manually restart the mythtv backend for my recording schedules to be read in from mythconverg (otherwise the list appears as empty and nothing records).

Does this bug exist in the stable release or am I having problems caused by some other issue?

Revision history for this message

Dave Johansen (davejohansen) wrote on 2009-05-01:

#13

This should definitely be added to the release notes or fixed, but it appears that the problem can be resolved by performing the workaround in the original description.

Also, just doing something to interact with the database (like schedule a dummy recording), always worked to fix the problem for me before I applied the workaround.

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-01:

#14

On a fresh install of mythbuntu 9.04 I get this problem. The fix described above does not work for me. (It doesn't change the behavior at all as far as I can tell).

My situation is more complex since I use a HDHomeRun tuner (so I need the network working as well). Here is what I've tried:

1) The default boot has mysql coming up from rc2.d at something low like S10 (I can't remember). Then S31mythbackend, then S50NetworkManager. Fails 80% of the time. I boot to a crippled mythfrontend that has scrambled keymappings and a broken theme. If I kill the frontend, restart the backend, and run frontend it works.

2) Don't auto start mythfrontend or mythbackend. Still use S50NetworkManager to bring the network up. Boot. From the desktop manually start mythbackend, then start mythfrontend. Works every time.

3) Loads of combinations of NetworkManager, manually configuring the network, moving mysql around and mythbackend around in rc2.d. Using my SxxWaitForHDHomeRun script. It's hard to find any combination that works.

I wrote a script that does mysql STATUS queries and prints the uptime. Put it all over rc2.d. The server shows a constantly increasing uptime so I don't think it's getting killed. At S31 it shows 5 sec of uptime. Still, mythbackend can't see it. Some queries work, some fail.

Finally had to: a) manually config the network (to give time for the HDHomeRun to be available on the network), b) move mythbackend to S99zMythbackend so it starts up really late. Now it works 90% of the time but it still fails a query or two from time to time (but then they start working again).

Crazy things going on...

Revision history for this message

Mario Limonciello (superm1) wrote on 2009-05-04:

#15

mysql.debdiff Edit (2.7 KiB, text/plain)

I've got a debdiff that i'd like to propose as a solution to this, but I would like to get some wider testing and feedback with it before uploading it to -proposed.

It's attached, and I have a test build on my PPA (version 5.1.30really5.0.75-0ubuntu10.1~ppa1) at https://launchpad.net/~superm1/+archive/ppa

Revision history for this message

barney_1 (szczys) wrote on 2009-05-05:

#16

Thanks for making that build available on your ppa Mario. I just installed it and it seems to have fixed my problems. I can reboot and I don't get the mysqld errors or the behavior of my recorded schedules not being loaded into the mythbackend.

One surprise is that it also seems to fix a playback issues I'd been having. When playing a recording there had been a 2-3 second black screen before the recording started playback. After installing this updated package the recording starts immediately.

Thanks again!

Mario Limonciello (superm1) on 2009-05-05

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status:	New → Confirmed

Mario Limonciello (superm1) on 2009-05-05

description:	updated
Changed in mythbuntu:
milestone:	jaunty-rc → none
status:	Confirmed → Triaged

Revision history for this message

Martin Pitt (pitti) wrote on 2009-05-05:

#17

Thanks, Mario. Assigned to Canonical server team for review.

Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
assignee:	nobody → Canonical Server Team (canonical-server)

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-06:

#18

I'm testing the patch. Works Great! Mario really is Super.

I'm a Mythbuntu 9.04 RC install that was fully updated as of yesterday (before the PPA test fix). I restored everything to the "error state" and verified the error still exists. The error state does have the problem. (mythbackend log shows failed mysql queries by the dozen).

Then I updated to the PPA. Zero defects. Works great. No query errors. Boots just fine.

One odd note. I had been tinkering with this so I did a script and "spammed" the script all through rc2.d. Here is the script:

#!/bin/bash
echo -e "Checking MySQL\n"
mysql --user=root --password=blahblah -e STATUS | grep -i uptime

It shows a continuously increasing uptime from boot in both the error state and in the fixed state. Zero difference. "uptime" must not mean what I think it means :)

Let me know if you want me to do a clean install and test just the patch. I'm set up to do that...

Revision history for this message

Martin Pitt (pitti) wrote on 2009-05-06:

#19

Accepted mysql-dfsg-5.0 into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status:	Confirmed → Fix Committed
tags:	added: verification-needed

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-06:

#20

Martin,

Is it a good enough test to install over Mario's PPA or would you prefer I revert mysql, reproduce the error, then apply the proposed over the top of it?

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-07:

#21

Added jaunty-proposed and did a partial update of mysql. No problems found with mythbuntu. I you want a test on a clean install let me know.

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-05-07:

#22

Thanks for taking to prepare a debdiff and conduct some testing. However I'm not comfortable with releasing an update that changes the signal handling, especially removing one.

I'd like to understand *why* is mysqld crashing in the first place. It seems that mythbackend is making mysqld crash. Could you enable the general query log (http://dev.mysql.com/doc/refman/5.0/en/query-log.html) and try to figure out which SQL statement makes the daemon crash?

If the proposed debdiff fixes the issue it means that something during the boot process sends a HUP signal to the mysql_safe process. I'd like to figure out what is sending the HUP signal to mysql_safe.

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status:	Fix Committed → Confirmed
tags:	removed: verification-needed

Revision history for this message

Mario Limonciello (superm1) wrote on 2009-05-07: Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

#23

There are two problems at stake here actually:

1) something is sending HUP to mysqld_safe. That's not been determined
yet. Clearly it's some root process that is doing it. It can't be
mythbackend because it doesn't run as root.

2) mysqld_safe is not properly handling the HUP. When the trap goes off,
it's supposed to mearly send a refresh command to mysqld. Here's what's
actually happening:

1) essentially, "mysqld & wait" is running
So mysqld is running in the background, and the bash script is blocked by
the wait command.
2) the trap comes in and causes wait to exit status 128
3) the refresh is issued to mysqld
3) the bash script continues and runs kill -9 mysqld

So, the logic that's INTENDED by the patch isn't working.

On Thu, May 7, 2009 at 14:14, Mathias Gug <email address hidden> wrote:

> Thanks for taking to prepare a debdiff and conduct some testing. However
> I'm not comfortable with releasing an update that changes the signal
> handling, especially removing one.
>
> I'd like to understand *why* is mysqld crashing in the first place. It
> seems that mythbackend is making mysqld crash. Could you enable the
> general query log (http://dev.mysql.com/doc/refman/5.0/en/query-
> log.html <http://dev.mysql.com/doc/refman/5.0/en/query-%0Alog.html>) and
> try to figure out which SQL statement makes the daemon
> crash?
>
> If the proposed debdiff fixes the issue it means that something during
> the boot process sends a HUP signal to the mysql_safe process. I'd like
> to figure out what is sending the HUP signal to mysql_safe.
>
> ** Changed in: mysql-dfsg-5.0 (Ubuntu Jaunty)
> Status: Fix Committed => Confirmed
>
> ** Tags removed: verification-needed
>
> --
> mysqld_safe thinks mysqld has crashed when it hasn't
> https://bugs.launchpad.net/bugs/326768
> You received this bug notification because you are a member of Mythbuntu
> Developers, which is subscribed to Mythbuntu.
>

--
Mario Limonciello
<email address hidden>

Revision history for this message

Steve Langasek (vorlon) wrote on 2009-05-07:

#24

Mathias has said this package shouldn't go to -updates in its current state, so I'm marking it verification-failed for now. Mathias, when you've had time to reevaluate the package and if you conclude it's suitable, please set it back to verification-needed; or if you upload a new version of the package instead, you shouldn't need to do anything further, we'll pick it up with the SRU team scripts.

tags:

added: verification-failed

Revision history for this message

laga (laga) wrote on 2009-05-07:

#25

Mathias,

i can trigger this problem on my laptop which is *not* running mythbackend by simply issuing 'sudo killall -HUP mysqld_safe'. So it's not related to MythTV.

In fact, if you issue 'mysqladmin refresh' as done by mysqld_safe when it catches a SIGHUP, mysqld will *not* crash. See Mario's explanation why the patch is broken.

How to break it:
sudo killall -HUP mysqld_safe

Expected behavior:
* 'mysqladmin refresh' is run
OR
* nothing happens, as intended by upstream

Resulting behavior:
* mysqld is restarted

Revision history for this message

DAP (akadap) wrote on 2009-05-07:

#26

See this thread
http://ubuntuforums.org/showthread.php?p=7233888#post7233888

superm1 suggested I enable the Jaunty Proposed repository to see if it fixed this problem, it did not. I still get this in the log:

May 7 12:43:01 AComputer mysqld_safe[4938]: Number of processes running now: 1
May 7 12:43:01 AComputer mysqld_safe[4946]: mysqld process hanging, pid 3191 - killed
May 7 12:43:01 AComputer mysqld_safe[4949]: restarted

I am running 2 HDHomerun boxes, and an ATI HD Blunder card. I have a script in rc2.d that delays mythbackend until the HDHomerun boxes can be accessed through the network. I have also been experimenting with delaying mythbackend to see if I can avoid the database problem.
I currently have a 3 minute delay in the mythtv-backend script to delay the start of the backend. The msqld_safe problem tracks the start time of the backend. If I delay the backend start time by 3 minutes, the mysqld_safe problem shows up 3 minutes later in the log file.
There is an advantage to the delay however, it appears that if the backend start is delayed long enough, mysqld restarts fast enough that it does not screw up the backend (4 sucessful restarts with this setup so far). Nasty kludge, but it appears to be working.

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-08:

#27

Strange. I set everything back to default except I didn't use my SxxWaitForHDHomeRun script. I manually configured the network and had Network Manager shut off. I verified I was getting the problem, then applied the patch. I haven't had a single mysqld process hanging since the patch.

I suppose I should reconnect network manager and see what happens.

As for the 3 minute delay; I noticed adding delays in the script that start mythbackend didn't help but if I didn't launch mythbackend, boot, and manually start, it worked just fine. That kind of behavior really makes me thing some crazy thread race condition.

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-08:

#28

Probably no point in me doing this but I spent the evening trying to get the mysqld process hanging with the patch installed and never could. I'm using network manager and the WaitForHDHomeRun script and I'm not seeing any trouble.

I forgot to mention - before the patch I occasionally saw mysqld go 100% on cpu usage. Other people are seeing that as well:

http://bugs.mysql.com/bug.php?id=34312

Revision history for this message

Blackpaw (blackpaw) wrote on 2009-05-09:

#29

I had exactly the same bugs - Mythbuntu 9.04, adding Mario's PPA (no other modifications) fixed it.

Thank you very much Mario!

Revision history for this message

DAP (akadap) wrote on 2009-05-09:

#30

Installed a new update today, Lots of mysqld messages in the log file, none appear to be errors, all appear to occur when mysqld starts, none occurred when mythtv-backend started 3 minutes later.
Seems the bug may be fixed.

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-05-11:

#31

I patched mysqld_safe directly based on Mario's diff (and replace the "& wait" that I had removed). This fixes the problem for me.

I also looked in the logs to see what was happening on boot with the change and I noticed:

May 10 11:02:55 willvo sshd[2981]: Received signal 15; terminating.
May 10 11:02:55 willvo ntpd[3721]: ntpd exiting on signal 15

but both seem to be running. Postfix also reloads its configuration. Is someone (upstart?) telling all daemons to reload their configs for some reason?

Is there an easy way to track down who sends a signal?

It seems there are two problems here - the signal being sent, and mysqld_safe poor response to receiving it. It would be nice to get to the bottom of each of them.

description:

updated

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-05-11: Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

#32

Hi Will,

On Mon, May 11, 2009 at 01:35:31AM -0000, Will Uther wrote:
> I also looked in the logs to see what was happening on boot with the
> change and I noticed:
>
> May 10 11:02:55 willvo sshd[2981]: Received signal 15; terminating.
> May 10 11:02:55 willvo ntpd[3721]: ntpd exiting on signal 15
>
> but both seem to be running. Postfix also reloads its configuration.
> Is someone (upstart?) telling all daemons to reload their configs for
> some reason?

Thanks for figuring this out. This is probably the root cause of the
issue. I've seen similar behavior with sshd - could you describe your
network configuration? Are you using network-manager or
/etc/network/intrefaces? Are you using a static configuration or rely on
dhcp?

> It seems there are two problems here - the signal being sent, and
> mysqld_safe poor response to receiving it. It would be nice to get to
> the bottom of each of them.
>

Agreed. It seems that the latter has been identified and a workaround is
described in this bug. I think the former should also be well understood
before pushing anything to a *stable* release. A known broken behaviour
with a workaround is better then pushing an update that breaks existing
production systems.

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Revision history for this message

Mario Limonciello (superm1) wrote on 2009-05-11:

#33

> Agreed. It seems that the latter has been identified and a workaround is
> described in this bug. I think the former should also be well understood
> before pushing anything to a *stable* release. A known broken behaviour
> with a workaround is better then pushing an update that breaks existing
> production systems.
I'm sorry, but I entirely disagree here.

Regardless of what is sending a SIGHUP to mysqld_safe, it should be a supported scenario to allow such signals to be sent to system daemons. It's common for SIGHUP to be used to ask to reload configuration files when the daemon supports it.

The "broken" patch from debian's sole purpose is adding support for catching SIGHUP and a few other signals. It doesn't work properly.

Ignoring the fact that mysqld is getting restarted rather than reloaded, the SIGHUP trap support to issue a refresh would *only* work if you configured /root/my.cnf or had no root mysql password defined in the first place.

How can reverting a portion of it break an existing production system in any way?

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-05-11:

#34

Download full text (5.5 KiB)

On Mon, May 11, 2009 at 08:33:20PM -0000, Mario Limonciello wrote:
> Regardless of what is sending a SIGHUP to mysqld_safe, it should be a
> supported scenario to allow such signals to be sent to system daemons.
> It's common for SIGHUP to be used to ask to reload configuration files
> when the daemon supports it.

Right. However mysqld doesn't support reloading its configuration files
via SIGHUP since upstream mysqld_safe disables SIGHUP and starts the
mysqld process with the nohup command.

The patch was originally introduced to fix Debian bug 208364 [1].
[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208364

> The "broken" patch from debian's sole purpose is adding support for
> catching SIGHUP and a few other signals. It doesn't work properly.
>

As your analysis in comment 23 [2] shows the bug is introduced when the
wait command exits. According to the bash man page:

  If bash is waiting for a command to complete and receives a signal for
  which a trap has been set, the trap will not be executed until the
  command completes. When bash is waiting for an asynchronous command
  via the wait builtin, the reception of a signal for which a trap has
  been set will cause the wait builtin to return immediately with an
  exit status greater than 128, immediately after which the trap is
  executed.

[2]: https://bugs.launchpad.net/ubuntu/+source/mysql-dfsg-5.0/+bug/326768/comments/23

Here is a trace of the mysqld_safe script run with debugging on (with an
added echo $? to output the return code of the wait command):

+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --
pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket =/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

[.... sudo killall -HUP mysqld_safe from a terminal ....]

+ /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
+ echo 129
129
+ test ! -f /var/run/mysqld/mysqld.pid
+ true
+ test 1 -eq 1
+ ps xaww
+ grep -v grep
+ grep /usr/sbin/mysqld\>
+ grep -c pid-file=/var/run/mysqld/mysqld.pid
+ numofproces=1
+ echo Number of processes running now: 1
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4867]: Number of processes running now: 1
+ I=1
+ test 1 -le 1
+ ps xaww
+ grep /usr/sbin/mysqld\>
+ grep -v grep
+ grep pid-file=/var/run/mysqld/mysqld.pid
+ sed -n $p
+ PROC= 4846 pts/1 S<l+ 0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var
/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-lo cking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ break
+ kill -9 4846
+ echo mysqld process hanging, pid 4846 - killed
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4875]: mysqld process hanging, pid 4846 - killed
+ expr 1 + 1
+ I=2
+ test 2 -le 1
+ echo restarted
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4878]: restarted
+ true
+ rm -f /var/run/mysqld/mysqld.sock /var/run/mysqld/mysqld.pid
+ test -z --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ eval nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --...

On Mon, May 11, 2009 at 08:33:20PM -0000, Mario Limonciello wrote:
> Regardless of what is sending a SIGHUP to mysqld_safe, it should be a
> supported scenario to allow such signals to be sent to system daemons.
> It's common for SIGHUP to be used to ask to reload configuration files
> when the daemon supports it.

Right. However mysqld doesn't support reloading its configuration files
via SIGHUP since upstream mysqld_safe disables SIGHUP and starts the
mysqld process with the nohup command.

The patch was originally introduced to fix Debian bug 208364 [1].
[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208364

> The "broken" patch from debian's sole purpose is adding support for
> catching SIGHUP and a few other signals.  It doesn't work properly.
>

As your analysis in comment 23 [2] shows the bug is introduced when the
wait command exits. According to the bash man page:

If bash is waiting for a command to complete and receives a signal for
  which a trap has been set, the trap will not be executed until the
  command completes.  When bash is waiting for an asynchronous command
  via the  wait builtin, the reception of a signal for which a trap has
  been set will cause the wait builtin to return immediately with an
  exit status greater than 128, immediately after which the trap is
  executed.

[2]: https://bugs.launchpad.net/ubuntu/+source/mysql-dfsg-5.0/+bug/326768/comments/23

Here is a trace of the mysqld_safe script run with debugging on (with an
added echo $? to output the return code of the wait command):

+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --
pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket =/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

[.... sudo killall -HUP mysqld_safe from a terminal ....]

+ /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
+ echo 129
129
+ test ! -f /var/run/mysqld/mysqld.pid
+ true
+ test 1 -eq 1
+ ps xaww
+ grep -v grep
+ grep /usr/sbin/mysqld\>
+ grep -c pid-file=/var/run/mysqld/mysqld.pid
+ numofproces=1
+ echo Number of processes running now: 1
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4867]: Number of processes running now: 1
+ I=1
+ test 1 -le 1
+ ps xaww
+ grep /usr/sbin/mysqld\>
+ grep -v grep
+ grep pid-file=/var/run/mysqld/mysqld.pid
+ sed -n $p
+ PROC= 4846 pts/1    S<l+   0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var
/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-lo cking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ break
+ kill -9 4846
+ echo mysqld process hanging, pid 4846 - killed
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4875]: mysqld process hanging, pid 4846 - killed
+ expr 1 + 1
+ I=2
+ test 2 -le 1
+ echo restarted
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4878]: restarted
+ true
+ rm -f /var/run/mysqld/mysqld.sock /var/run/mysqld/mysqld.pid
+ test -z --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ eval nohup /usr/sbin/mysqld  --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock 2>&1 | logger -p daemon.err -t mysqld_safe -i -t mysqld & wait
+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

So the Debian patch adds another reason why the mysqld_safe script would
stop blocking beside a mysqld crash. The upstream script assumes that
a mysqld crash would be the only reason to stop blocking the execution
of mysqld_safe. Thus the observed behaviour of mysqld being killed and
restarted.

> Ignoring the fact that mysqld is getting restarted rather than reloaded,
> the SIGHUP trap support to issue a refresh would *only* work if you
> configured /root/my.cnf or had no root mysql password defined in the
> first place.
>

Considering that the mysqladmin command is run with the option '--defaults-extra-file=/etc/mysql/debian.cnf' mysqladmin is able to connect to the mysqld process even if there is a root password set since it uses the debian-sys-maint account.  Flush tables shown by the status command is correctly incremented:

$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 36  Threads: 1  Questions: 1  Slow queries: 0  Opens: 12  Flush tables: 1  Open tables: 6  Queries per second avg: 0.028
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 42  Threads: 1  Questions: 4  Slow queries: 0  Opens: 12  Flush tables: 2  Open tables: 0  Queries per second avg: 0.095

> How can reverting a portion of it break an existing production system in
> any way?
>

Well - I don't know. Which is a good reason to *not* push an update to a
*stable* release. It seemed to me that the root cause of the faulty
behaviour wasn't fully identified and understood. It may well turn out
that the proposed patch is the correct way to fix the issue. However
releasing an update in a *stable* release requires a full understanding
of the patch and its potential regressions.

For now an investigation of the purpose of Debian patch
38_scripts__mysqld_safe.sh__signals.dpatch with regard to Debian bug
208364 should be conducted to make sure regressions are not introduced
by the proposed debdiff.

-- 
Mathias Gug
Ubuntu Developer  http://www.ubuntu.com

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-05-12:

#35

mysql-dfsg-5.0_5.1.30really5.0.75-0ubuntu10.2.debdiff Edit (1.8 KiB, text/plain)

Here is another way to fix the issue: add a wait to the trap:

-+trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh' 1 # HUP
++trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh & wait' 1 # HUP

This patch doesn't remove an existing functionality (signals are handled by mysqld_safe).

I've attached a debdiff against ubuntu10 (currently in jaunty). I'll publish a package in my PPA and ask for more testing once done.

Revision history for this message

Mario Limonciello (superm1) wrote on 2009-05-12: Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

#36

Mathias:

Spectacular idea. I just manually tried it on my test system that i've been
using to investigate this bug. It solves the problems entirely.

On Mon, May 11, 2009 at 22:04, Mathias Gug <email address hidden> wrote:

> Here is another way to fix the issue: add a wait to the trap:
>
> -+trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf
> refresh' 1 # HUP
> ++trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf
> refresh & wait' 1 # HUP
>
> This patch doesn't remove an existing functionality (signals are handled
> by mysqld_safe).
>
> I've attached a debdiff against ubuntu10 (currently in jaunty). I'll
> publish a package in my PPA and ask for more testing once done.
>
> ** Attachment added:
> "mysql-dfsg-5.0_5.1.30really5.0.75-0ubuntu10.2.debdiff"
>
> http://launchpadlibrarian.net/26619255/mysql-dfsg-5.0_5.1.30really5.0.75-0ubuntu10.2.debdiff
>
> --
> mysqld_safe thinks mysqld has crashed when it hasn't
> https://bugs.launchpad.net/bugs/326768
> You received this bug notification because you are a member of Mythbuntu
> Developers, which is subscribed to Mythbuntu.
>

--
Mario Limonciello
<email address hidden>

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-05-12: Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

#37

On 12/05/2009, at 2:16 AM, Mathias Gug wrote:

> Thanks for figuring this out. This is probably the root cause of the
> issue. I've seen similar behavior with sshd - could you describe your
> network configuration? Are you using network-manager or
> /etc/network/intrefaces? Are you using a static configuration or
> rely on
> dhcp?

I have a static configuration in /etc/network/interfaces.

The reload seems to happen just before NetworkManager starts up.

Will :-}

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-05-12:

#38

I just tried manually adding Mathias' latest change. That also seems to solve the problem for me. (This was not a full re-install with that patch - just me with vim reverting the old patch and adding that patch. I hope I haven't made any mistakes :)

Revision history for this message

MarcRandolph (mrand) wrote on 2009-05-12:

#39

Determine the process ID of SIGHUP source Edit (2.5 KiB, text/plain)

I'm not sure if I'm late to the party or not here... but on the off chance that it would be helpful, I threw together a program to print out the source process ID for a SIGHUP signal. I have attached the source to this bug. Very simple to use... in this case:

1. mv mysqld_safe mysqld_safe_real
2. gcc sig_src.c -o mysqld_safe
3. Launch programs/services that cause crash
4. Watch stderr output, wherever that happens to be going. This could be modified to log to a file.

Revision history for this message

Will Uther (willu-mailinglists) wrote on 2009-05-12:

#40

I just tried Marc's signal tracker: Here is the output after boot:

Executing (/usr/bin/mysqld_safe_real )
signo: [1], errno: [0], code: [128], pid: [0], uid: [0]

It looks like a code of 0x80 corresponds to a code of SI_KERNEL.

What does the kernel have against mysqld_safe?

Mathias Gug (mathiaz) on 2009-05-12

description:	updated
tags:	removed: verification-failed
Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
assignee:	Canonical Server Team (canonical-server) → Mathias Gug (mathiaz)
status:	Confirmed → In Progress
Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
assignee:	nobody → Mathias Gug (mathiaz)
status:	Confirmed → In Progress

Bug Watch Updater (bug-watch-updater) on 2009-05-12

Changed in mysql-dfsg-5.0 (Debian):
status:	Unknown → New

Revision history for this message

Martin Pitt (pitti) wrote on 2009-05-14:

#41

Accepted mysql-dfsg-5.0 into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status:	In Progress → Fix Committed
tags:	added: verification-needed

Revision history for this message

Dewey_Oxberger (jonb-cirris) wrote on 2009-05-18:

#42

Proposed works for me. No sign of mysql trouble in any of the logs.

Revision history for this message

Bill Meek (llibkeem) wrote on 2009-05-18:

#43

Short answer: Fixed.

I was running on Mythbuntu 8.10 and didn't have these errors.

Fresh load of Mythbuntu 9.04RC, got the 'hanging' messages in syslog and database errors ending with
'MySQL server has gone away' in mythbackend.log. These messages are gone now.

Thank you.

Martin Pitt (pitti) on 2009-06-02

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2009-06-02:

#44

This bug was fixed in the package mysql-dfsg-5.0 - 5.1.30really5.0.75-0ubuntu10.2

---------------
mysql-dfsg-5.0 (5.1.30really5.0.75-0ubuntu10.2) jaunty-proposed; urgency=low

  * debian/patches/38_scripts__mysqld_safe.sh__signals.dpatch: wait in the
    SIGHUP trap to avoid killing an existing mysqld process when a HUP signal
    is sent to mysqld_safe. (LP: #326768)

-- Mathias Gug <email address hidden> Mon, 11 May 2009 22:41:44 -0400

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status:	Fix Committed → Fix Released

Revision history for this message

Martin Pitt (pitti) wrote on 2009-06-02:

#45

Mathias, please upload fix to Karmic ASAP.

Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
importance:	Undecided → High

Revision history for this message

Stephane Chazelas (stephane-chazelas) wrote on 2009-06-09:

#46

that patch is still wrong.

The first time a HUP is received, we run the code in the trap
and call wait which will wait for both the refresh command and
the mysqld one.

But we won't return from that trap until mysqld dies, and in the
trap the HUP signal is blocked, which means any subsequent
HUP will not be handled.

A better way could be to implement some proper event handling as in:

trap : HUP INT QUIT TERM
while :; do
  action=
  mysqld ... &
  while :; do
    signal=NONE
    wait || signal=$(kill -l "$?")

    case $signal in
      (INT|TERM|QUIT) mysqladmin ... shutdown; exit;;
      (HUP) mysqladmin ... refresh;;
      (NONE) break;; # mysqld died
      (*) unexpected ... die;;
    esac
  esac
done

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status:	Fix Released → Incomplete

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-06-19:

#47

Thanks to the careful and detailed analysis of Stephane in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=527623, it seems that there are two issues:

==========================================================
SIGHUP, SIGQUIT, SIGINT ignored after package installation

After an apt-get install SIGHUP (as well as SIGQUIT and SIGINT) is ignored by mysqld_safe. This is due to the fact that apt blocks SIGHUP, SIGQUIT and SIGINT which is propagated to dpkg, the maintainer shell scripts, the mysql init shell script and finally mysqld_safe (a shell script). However SIGTERM is not.

What are the consequences? A mysql refresh command cannot be sent to mysqld by sending a SIGHUP to mysqld_safe.

Since SIGTERM is not ignored mysqld_safe can properly shutdown mysqld.

It seems that this issue is not related to the bug described here and new bug should be opened instead.

===============================================================
SIGTERM may be ignored when bash is the default shell (/bin/sh)

If /bin/sh points to bash (not the default configuration), SIGHUP (and other signals) will *always* be ignored by mysqld_safe after the first SIGHUP has been sent to mysqld_safe. This is a change in behavior brought by the new upload.

The consequences are similar to the ones mentionned above with the added case that SIGTERM will also be ignored. While this is more problematic, it should be noted that the mysql init script tries *first* to shutdown mysqld via mysqladmin, and uses SIGTERM on mysqld (and not mysqld_safe) if it fails, and then SIGKILL on mysqld (and not mysqld_safe) if SIGTERM fails.

Before the upload the behavior was that on every SIGHUP (regardless of the shell), mysqld_safe would kill mysqld (via SIGKILL) and restart mysqld. Since mysqld was killed with SIGKILL a proper shutdown was *not* done, which could lead to data corruption.

The proposed upload removes a potential data corruption for mysqld except in the case where bash is used as the default shell (/bin/sh).

In this non-default configuration (bash as the default shell):

  * sending a SIGHUP to mysqld_safe would lead to:
    - before (ubuntu10): mysqld killed with SIGKILL and restarted (potential data corruption)
    - after (ubuntu10.2): refresh the first time, nothing on subsequent signaling.
    This is the bug reported here and the upload provides an improvement.

  * sending a SIGTERM to mysqld_safe would lead to:
    - before (ubuntu10): proper shutdown of mysqld
    - after (ubuntu10.2): proper shutdown of mysqld if no SIGHUP has been sent before, nothing if a SIGHUP has already been sent to mysqld_safe.
    This is a regression.

However it should be noted that the mysql init script doesn't send signals to mysqld_safe when stopping mysql and is run before the sendsigs script during the system shutdown process (rc6).