Debian
mysql-dfsg-5.0 package

Bug #326768
Comment #34

Comment 34 for bug 326768

Revision history for this message

Mathias Gug (mathiaz) wrote on 2009-05-11: Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

#34

On Mon, May 11, 2009 at 08:33:20PM -0000, Mario Limonciello wrote:
> Regardless of what is sending a SIGHUP to mysqld_safe, it should be a
> supported scenario to allow such signals to be sent to system daemons.
> It's common for SIGHUP to be used to ask to reload configuration files
> when the daemon supports it.

Right. However mysqld doesn't support reloading its configuration files
via SIGHUP since upstream mysqld_safe disables SIGHUP and starts the
mysqld process with the nohup command.

The patch was originally introduced to fix Debian bug 208364 [1].
[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208364

> The "broken" patch from debian's sole purpose is adding support for
> catching SIGHUP and a few other signals. It doesn't work properly.
>

As your analysis in comment 23 [2] shows the bug is introduced when the
wait command exits. According to the bash man page:

  If bash is waiting for a command to complete and receives a signal for
  which a trap has been set, the trap will not be executed until the
  command completes. When bash is waiting for an asynchronous command
  via the wait builtin, the reception of a signal for which a trap has
  been set will cause the wait builtin to return immediately with an
  exit status greater than 128, immediately after which the trap is
  executed.

[2]: https://bugs.launchpad.net/ubuntu/+source/mysql-dfsg-5.0/+bug/326768/comments/23

Here is a trace of the mysqld_safe script run with debugging on (with an
added echo $? to output the return code of the wait command):

+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --
pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket =/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

[.... sudo killall -HUP mysqld_safe from a terminal ....]

+ /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
+ echo 129
129
+ test ! -f /var/run/mysqld/mysqld.pid
+ true
+ test 1 -eq 1
+ ps xaww
+ grep -v grep
+ grep /usr/sbin/mysqld\>
+ grep -c pid-file=/var/run/mysqld/mysqld.pid
+ numofproces=1
+ echo Number of processes running now: 1
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4867]: Number of processes running now: 1
+ I=1
+ test 1 -le 1
+ ps xaww
+ grep /usr/sbin/mysqld\>
+ grep -v grep
+ grep pid-file=/var/run/mysqld/mysqld.pid
+ sed -n $p
+ PROC= 4846 pts/1 S<l+ 0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var
/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-lo cking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ break
+ kill -9 4846
+ echo mysqld process hanging, pid 4846 - killed
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4875]: mysqld process hanging, pid 4846 - killed
+ expr 1 + 1
+ I=2
+ test 2 -le 1
+ echo restarted
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4878]: restarted
+ true
+ rm -f /var/run/mysqld/mysqld.sock /var/run/mysqld/mysqld.pid
+ test -z --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ eval nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock 2>&1 | logger -p daemon.err -t mysqld_safe -i -t mysqld & wait
+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

So the Debian patch adds another reason why the mysqld_safe script would
stop blocking beside a mysqld crash. The upstream script assumes that
a mysqld crash would be the only reason to stop blocking the execution
of mysqld_safe. Thus the observed behaviour of mysqld being killed and
restarted.

> Ignoring the fact that mysqld is getting restarted rather than reloaded,
> the SIGHUP trap support to issue a refresh would *only* work if you
> configured /root/my.cnf or had no root mysql password defined in the
> first place.
>

Considering that the mysqladmin command is run with the option '--defaults-extra-file=/etc/mysql/debian.cnf' mysqladmin is able to connect to the mysqld process even if there is a root password set since it uses the debian-sys-maint account. Flush tables shown by the status command is correctly incremented:

$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 36 Threads: 1 Questions: 1 Slow queries: 0 Opens: 12 Flush tables: 1 Open tables: 6 Queries per second avg: 0.028
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 42 Threads: 1 Questions: 4 Slow queries: 0 Opens: 12 Flush tables: 2 Open tables: 0 Queries per second avg: 0.095

> How can reverting a portion of it break an existing production system in
> any way?
>

Well - I don't know. Which is a good reason to *not* push an update to a
*stable* release. It seemed to me that the root cause of the faulty
behaviour wasn't fully identified and understood. It may well turn out
that the proposed patch is the correct way to fix the issue. However
releasing an update in a *stable* release requires a full understanding
of the patch and its potential regressions.

For now an investigation of the purpose of Debian patch
38_scripts__mysqld_safe.sh__signals.dpatch with regard to Debian bug
208364 should be conducted to make sure regressions are not introduced
by the proposed debdiff.

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Right. However mysqld doesn't support reloading its configuration files
via SIGHUP since upstream mysqld_safe disables SIGHUP and starts the
mysqld process with the nohup command.

The patch was originally introduced to fix Debian bug 208364 [1].
[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208364

> The "broken" patch from debian's sole purpose is adding support for
> catching SIGHUP and a few other signals.  It doesn't work properly.
>

As your analysis in comment 23 [2] shows the bug is introduced when the
wait command exits. According to the bash man page:

If bash is waiting for a command to complete and receives a signal for
  which a trap has been set, the trap will not be executed until the
  command completes.  When bash is waiting for an asynchronous command
  via the  wait builtin, the reception of a signal for which a trap has
  been set will cause the wait builtin to return immediately with an
  exit status greater than 128, immediately after which the trap is
  executed.

[2]: https://bugs.launchpad.net/ubuntu/+source/mysql-dfsg-5.0/+bug/326768/comments/23

Here is a trace of the mysqld_safe script run with debugging on (with an
added echo $? to output the return code of the wait command):

[.... sudo killall -HUP mysqld_safe from a terminal ....]

+ /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
+ echo 129
129
+ test ! -f /var/run/mysqld/mysqld.pid
+ true
+ test 1 -eq 1
+ ps xaww
+ grep -v grep
+ grep /usr/sbin/mysqld\>
+ grep -c pid-file=/var/run/mysqld/mysqld.pid
+ numofproces=1
+ echo Number of processes running now: 1
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4867]: Number of processes running now: 1
+ I=1
+ test 1 -le 1
+ ps xaww
+ grep /usr/sbin/mysqld\>
+ grep -v grep
+ grep pid-file=/var/run/mysqld/mysqld.pid
+ sed -n $p
+ PROC= 4846 pts/1    S<l+   0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var
/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-lo cking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ break
+ kill -9 4846
+ echo mysqld process hanging, pid 4846 - killed
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4875]: mysqld process hanging, pid 4846 - killed
+ expr 1 + 1
+ I=2
+ test 2 -le 1
+ echo restarted
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4878]: restarted
+ true
+ rm -f /var/run/mysqld/mysqld.sock /var/run/mysqld/mysqld.pid
+ test -z --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ eval nohup /usr/sbin/mysqld  --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock 2>&1 | logger -p daemon.err -t mysqld_safe -i -t mysqld & wait
+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

Considering that the mysqladmin command is run with the option '--defaults-extra-file=/etc/mysql/debian.cnf' mysqladmin is able to connect to the mysqld process even if there is a root password set since it uses the debian-sys-maint account.  Flush tables shown by the status command is correctly incremented:

$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 36  Threads: 1  Questions: 1  Slow queries: 0  Opens: 12  Flush tables: 1  Open tables: 6  Queries per second avg: 0.028
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 42  Threads: 1  Questions: 4  Slow queries: 0  Opens: 12  Flush tables: 2  Open tables: 0  Queries per second avg: 0.095

> How can reverting a portion of it break an existing production system in
> any way?
>

-- 
Mathias Gug
Ubuntu Developer  http://www.ubuntu.com

Debianmysql-dfsg-5.0 package

Comment 34 for bug 326768

Debian
mysql-dfsg-5.0 package