MySQL persistent queue mishandles re-connection

Bug #1319506 reported by Berend Ozceri
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Gearman
New
Undecided
Unassigned

Bug Description

It appears that if the connection between the job server and the MySQL server providing the persistent queue storage is severed (for example, due to the MySQL server being restarted), the job server does not appear to make any re-connection attempts. From that point on, a lot of "mysql_stmt_execute failed: Incorrect arguments to mysqld_stmt_execute" errors are logged. Restarting the job server restores functionality.

Revision history for this message
Berend Ozceri (berend-p) wrote :

The issue may be that while the MySQL queue plugin does this:

mysql_options(queue->con, MYSQL_OPT_RECONNECT, &my_true);

the MySQL documentation documents that:

"...if a reconnection does occur, several aspects of the connection state are reset on the server side and your application will not be notified.

The connection-related state is affected as follows:
...
- Prepared statements are released."

and the MySQL queue plugin may be continuing to use prepared statements across re-connection conditions.

Revision history for this message
Berend Ozceri (berend-p) wrote :

On further inspection, there appears to be an attempt to re-prepare statements if the error returned during statement execution is CR_SERVER_LOST, but based on the log messages from the job server:

  ERROR 2014-05-14 10:06:46.000000 [ proc ] Remove from persistent queue gearman_server_run_command(QUEUE_ERROR) -> libgearman-server/server.cc:792
  ERROR 2014-05-14 10:06:46.000000 [ proc ] mysql_stmt_execute failed: Incorrect arguments to mysqld_stmt_execute -> libgearman-server/plugins/queue/mysql/queue.cc:430

it may be the case that the error being returned is ER_WRONG_ARGUMENTS.

Revision history for this message
Berend Ozceri (berend-p) wrote :

I have been able to recreate this in testing whereby I can get the job server to log either:

ERROR 2014-05-14 20:34:05.000000 [ 1 ] mysql_stmt_execute failed: Incorrect arguments to mysqld_stmt_execute (1210 1210) -> libgearman-server/plugins/queue/mysql/queue.cc:432

or:

ERROR 2014-05-14 20:35:57.000000 [ 1 ] mysql_stmt_execute failed: Unknown prepared statement handler (1) given to mysqld_stmt_execute (1243 1243) -> libgearman-server/plugins/queue/mysql/queue.cc:432

errors when a worker reports the completion of a job after the MySQL server connection is dropped and re-connected automatically. The first error code in the parentheses is from mysql_errno(queue->con) and the second from mysql_stmt_errno(queue->done_stmt).

There may be some sensitivity in this issue that the first thing that happens after a re-connection to the MySQL server be the worker completion of a job as opposed to the addition of a new job from a client.

Berend Ozceri (berend-p)
summary: - MySQL persistent queue doesn't re-connect
+ MySQL persistent queue mishandles re-connection
Revision history for this message
Berend Ozceri (berend-p) wrote :

OK. It looks like what's happening is that if an automatic re-connection occurs and one of the prepared statements (either for the "add" case or for the "done" case) is re-prepared, it is possible for the handle associated with that new prepared statement on the fresh connection to be the same as the the handle for the other prepared statement on the now-disconnected connection. As a result, not knowing that the connection has been re-established, the usage of a stale prepared statement handle causes an "incorrect arguments" error instead of the expected "server lost" error.

My initial recommendations based on these findings are:

1) If executing a statement fails, inspect the "global" error on the connection handle first (e.g. mysql_errno(queue->con)). React to ER_WRONG_ARGUMENTS and ER_UNKNOWN_STMT_HANDLER in the same manner as CR_SERVER_LOST and re-prepare the statements.
2) Prepare both statements at once instead of preparing them separately to avoid inconsistencies in the statement handle usage across re-connections.

Revision history for this message
chjgcn (chjgcn) wrote :

After many tests following your first recommendation, I find out that there is a strange appearance that mysql_stmt_execute fails, and mysql_stmt_errno returns zero! If the statement is not re-prepared in this situation, mysql_stmt_execute will fail at the next time, and mysql_stmt_errno will return 1243. So in the patch file, the condition of statement re-preparing is removed. In other words, no matter what is returned by mysql_stmt_errno, the statement will be re-prepared and then executed in the next loop.

Revision history for this message
chjgcn (chjgcn) wrote :

In the above comment, the patch file contains other bug correction and functionality enhancement, so I post a clean patch file for this bug in this comment.

Revision history for this message
chjgcn (chjgcn) wrote :

The newer patch file related with
    https://bugs.launchpad.net/gearmand/+bug/1319506/comments/5
is in
    https://bugs.launchpad.net/gearmand/+bug/1348865/comments/6
I forgot to add <cstring> header file into
    libgearman/interface/universal.hpp

Revision history for this message
sanderkrause (sanderkrause) wrote :

My project is also affected by this bug. Is there any hope of seeing a fix release for this?

Revision history for this message
fastest963 (fastest963) wrote :

I meant to comment this months ago but we were having this issue in production. We manually applied the patch above and having had an issue.

Revision history for this message
fastest963 (fastest963) wrote :

Sorry, meant haven't* had an issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.