Percona Server with XtraDB

attaching to percona-server with gdb disconnects clients

Reported by Greg Hazel on 2011-07-05
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server
High
Laurynas Biveinis
5.1
High
Laurynas Biveinis
5.5
High
Laurynas Biveinis

Bug Description

Simply attaching gdb (then detaching immediately) to a running Percona mysqld instance causes Percona-server to disconnect clients. Same happens if you attach gdb then continue. Clients get "Lost connection to MySQL server during query" but reconnecting works fine and the server seem to be healthy after that. Using telnet to test shows the socket close when gdb attaches. This bug makes using "pstack", "poor man's profilier" or even Aspersa's "connect" tool impossible without disconnecting clients.

Percona-server 5.5.13 (installed from yum repo), CentOS 5.6 (running on EC2. also happens with Amazon AMI), gdb 7.0.1

Interestingly, it does not happen with original mysqld (5.5.10 from webtatic or 5.5.13 built from source).

Tags: gdb Edit Tag help
description: updated
description: updated
description: updated
Changed in percona-server:
assignee: nobody → Valentine Gostev (longbow)
Changed in percona-server:
importance: Undecided → Low
Alexey Kopytov (akopytov) wrote :

It looks rather serious. Why was it set to low importance?

Changed in percona-server:
importance: Low → High
Valentine Gostev (longbow) wrote :

reproduced with 5.5.17

Changed in percona-server:
status: New → In Progress
Valentine Gostev (longbow) wrote :

It looks like gdb while attaching issues SIGSTOP and SIGCONT signals to mysqld process. PS drops connections, while vanilla mysql does not.

Another way to reproduce:
kill -STOP $mysqld_pid
kill -CONT $mysqld_pid
PS interrupts query, vanilla continues running query

Changed in percona-server:
status: In Progress → Confirmed
Takenori Akagi (anonimo) wrote :

Confirmed on 5.5

Changed in percona-server:
status: Confirmed → Triaged

Tested it on both 5.5.22 Percona Server and MySQL, and confirmed on Percona 5.5.22

Following patch fixes it:
==================================

--- sql/net_serv.cc 2012-05-06 23:20:51.968530130 +0530
+++ /tmp/net_serv.cc 2012-05-06 21:11:01.505038180 +0530
@@ -835,7 +835,7 @@

          DBUG_PRINT("info",("vio_read returned %ld errno: %d",
                             (long) length, vio_errno(net->vio)));
-#if !defined(NO_ALARM) && (!defined(__WIN__) || defined(MYSQL_SERVER))
+#if !defined(__WIN__) || defined(MYSQL_SERVER)
          /*
            We got an error that there was no data on the socket. We now set up
            an alarm to not 'read forever', change the socket to non blocking

====================================================

Tested the patch as well.

The way I discovered it is as follows:

1. I did a strace while sending STOP/CONT signals, here is the strace https://gist.github.com/d5deace9f70fc0514083

        I compared it to strace for MySQL 5.5.22, and I noticed that there is a shutdown(2) which is called for the new-sock spawned for the new connection -- https://gist.github.com/d5deace9f70fc0514083#gistcomment-305803

2. Next, I did gdb attach for the same process, with same STOP/CONT signals and mysql client; here is the gdb backtrace -- https://gist.github.com/c4a76c2342c50088d6bd

3. Next, I did stepping inside do_handle_one_connection and noticed

      3.1

  for (;;)
  {
    bool rc;
    bool create_user= TRUE;

    rc= thd_prepare_connection(thd);
    if (rc)
    {
      create_user= FALSE;
      goto end_thread;
    }

    while (thd_is_connection_alive(thd))
    {
      mysql_audit_release(thd);
      if (do_command(thd)) ------> it returned here and caused it to return on signal
 break;
    }

4. So, I compared the code, between PS and MySQL, for functions do_command calls, and noticed the diff -- !defined(NO_ALARM) -- which I have posted as difference.

5. Tested it with the change and it seems to work.

Raghu, thanks for your excellent analysis and fix.

Valentine, would it be easy to create a regression test for the test suite based on this? If it's not very easy, IMHO we can fix without the test in this case.

Stewart Smith (stewart) on 2012-09-04
Changed in percona-server:
assignee: Valentine Gostev (longbow) → nobody

Not reproducible on 5.1. MTR testcase for this bug is same as for bug 1060136:

--source include/not_windows.inc

let $mysqld_pid_file=`SELECT @@GLOBAL.pid_file`;

system kill -STOP `cat $mysqld_pid_file`;
system kill -CONT `cat $mysqld_pid_file`;

# Server gone!
SELECT 2+2;

By code analysis, attaching GDB to mysqld while it is writing to socket will result in the same issue. This is harder to reproduce though.

Mario Splivalo (mariosplivalo) wrote :

This is easy to reproduce - issue 'select sleep(90)' in mysql-cli, and then resize the terminal.

Mario -

This bug concerns with the server not being able to handle SIGSTOP/SIGCONT. The issue of terminal resize is client not handling SIGWINCH correctly and that is bug 925343, fixed in the upcoming release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions