First connection after changing wsrep_cluster_address hangs

Bug #1022250 reported by Alex Yurchenko
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Medium
Seppo Jaakola
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned

Bug Description

As reported by Patrick Zoblisein:

I've been doing some more testing of the `SET GLOBAL wsrep_cluster_address='<>;` functionality and I'm still able to hang my node - regardless of whether it's changing the cluster_address to another node in the cluster, or removing the node from the cluster via gcomm://;

Performing the dynamic change in my lab environment works - no issues as Seppo notes above. It's not until I make a connection to the node, "post-set-GLOBAL-wsrep_cluster_address" - here is a lot of output showing no connections (other than my working one) while I bounce back and forth between cluster_address':

[root@lab1 pxc]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4, Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 1 | system user | | NULL | Sleep | 13 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 13 | NULL | NULL | 0 | 0 | 1 |
| 3 | event_scheduler | localhost | NULL | Daemon | 13 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name | Value |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.4:1260 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 1 | system user | | NULL | Sleep | 38 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 38 | NULL | NULL | 0 | 0 | 1 |
| 3 | event_scheduler | localhost | NULL | Daemon | 38 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 2 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.01 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 3 | event_scheduler | localhost | NULL | Daemon | 67 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 2 |
| 5 | system user | | NULL | Sleep | 4 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 6 | system user | | NULL | Sleep | 4 | NULL | NULL | 0 | 0 | 1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://1.2.3.4:1260';
Query OK, 0 rows affected (3.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------+
| Variable_name | Value |
+-----------------------+----------+
| wsrep_cluster_address | gcomm:// |
+-----------------------+----------+
1 row in set (0.01 sec)

mysql> set global wsrep_cluster_address='gcomm://1.2.3.5:1270';
Query OK, 0 rows affected (3.01 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name | Value |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.5:1270 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.00 sec)

Now, I try to make another connection - which just sits there:
[root@lab1 workarea]# mysql -u root -p
Enter password:

Flip over to my working connection and I see a new connection stuck in the "login" state:
mysql> show processlist;
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 3 | event_scheduler | localhost | NULL | Daemon | 810 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 2 |
| 13 | system user | | NULL | Sleep | 430 | NULL | NULL | 0 | 0 | 1 |
| 14 | system user | | NULL | Sleep | 430 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 15 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0 | 0 | 1 |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
5 rows in set (0.00 sec)

NOTE - this does not block all new connections by the way - open another window and make another connection and we're OK.
[root@lab1 ~]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 16
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4 , Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 3 | event_scheduler | localhost | NULL | Daemon | 1016 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Sleep | 206 | | NULL | 0 | 0 | 2 |
| 13 | system user | | NULL | Sleep | 636 | NULL | NULL | 0 | 0 | 1 |
| 14 | system user | | NULL | Sleep | 636 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 15 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0 | 0 | 1 |
| 16 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
6 rows in set (0.00 sec)

The issue then becomes not being able to do another dynamic setting of the cluster_address:

mysql> set global wsrep_cluster_address='1.2.3.4:1260';

(wsrep_debug is equal to 1 in the cnf file)
120329 23:48:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120329 23:48:27 [Note] WSREP: Assign initial position for certification: 150026, protocol version: 2
120329 23:48:27 [Note] WSREP: Synchronized with group, ready for connections
120329 23:48:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120330 0:00:03 [Note] WSREP: Stop replication
120330 0:00:03 [Note] WSREP: Provider disconnect
120330 0:00:03 [Note] WSREP: Closing send monitor...
120330 0:00:03 [Note] WSREP: Closed send monitor.
120330 0:00:03 [Note] WSREP: gcomm: terminating thread
120330 0:00:03 [Note] WSREP: gcomm: joining thread
120330 0:00:03 [Note] WSREP: gcomm: closing backend
120330 0:00:03 [Note] WSREP: GMCast::handle_stable_view: view((empty))
120330 0:00:03 [Note] WSREP: Received self-leave message.
120330 0:00:03 [Note] WSREP: gcomm: closed
120330 0:00:03 [Note] WSREP: Flow-control interval: [0, 0]
120330 0:00:03 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120330 0:00:03 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 150026)
120330 0:00:03 [Note] WSREP: RECV thread exiting 0: Success
120330 0:00:03 [Note] WSREP: New cluster view: global state: b2a15cb4-7786-11e1-0800-0a71e5c1d29d:150026, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 1
120330 0:00:03 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120330 0:00:03 [Note] WSREP: recv_thread() joined.
120330 0:00:03 [Note] WSREP: Closing slave action queue.
120330 0:00:03 [Note] WSREP: closing connection 15
120330 0:00:03 [Note] WSREP: closing connection 4
120330 0:00:03 [Note] WSREP: applier thread exiting (code:0)
120330 0:00:03 [Note] WSREP: closing applier 13
120330 0:00:05 [Note] WSREP: SST kill local trx: 15
120330 0:00:05 [Note] WSREP: SST kill local trx: 4
120330 0:00:05 [Note] WSREP: waiting for client connections to close: 5

My once working connection is no longer:

mysql> show processlist;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...

Performing a shutdown of the mysql service hangs and eventually times out - only option is to kill -9.

Is there anything I can do to better troubleshoot the hung user connection?

Thanks
Patrick

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

This concerns only the very first connection. Subsequent connections go through.

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
importance: Undecided → High
milestone: none → 5.5.25-23.7
status: New → Confirmed
summary: - First connection after changeing wsrep_cluster_address hangs
+ First connection after changing wsrep_cluster_address hangs
Changed in codership-mysql:
milestone: 5.5.28-23.7 → 5.5.28-23.8
Changed in percona-xtradb-cluster:
status: New → Confirmed
Changed in codership-mysql:
milestone: 5.5.30-24.8 → 5.5.31-23.7.4
Changed in percona-xtradb-cluster:
milestone: none → 5.5.31-25
Changed in codership-mysql:
milestone: 5.5.31-23.7.5 → 5.5.32-23.7.6
Changed in percona-xtradb-cluster:
milestone: 5.5.33-23.7.6 → future-5.5
Changed in codership-mysql:
milestone: 5.5.33-23.7.6 → 5.5.34-24.9
Changed in codership-mysql:
milestone: 5.5.34-25.9 → 5.5.34-25.10
Changed in codership-mysql:
importance: High → Medium
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Tried with latest wsrep-5.5 and wsrep-5.6 builds and could not reproduce the issue anymore

Revisions were as follows:
5.5.35: http://bazaar.launchpad.net/~codership/codership-mysql/wsrep-5.5/revision/3935
5.6.15: http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/4030

Changed in codership-mysql:
status: Confirmed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1231

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.