First connection after changing wsrep_cluster_address hangs

Bug #1022250 reported by Alex Yurchenko on 2012-07-08
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Medium
Seppo Jaakola
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned

Bug Description

As reported by Patrick Zoblisein:

I've been doing some more testing of the `SET GLOBAL wsrep_cluster_address='<>;` functionality and I'm still able to hang my node - regardless of whether it's changing the cluster_address to another node in the cluster, or removing the node from the cluster via gcomm://;

Performing the dynamic change in my lab environment works - no issues as Seppo notes above. It's not until I make a connection to the node, "post-set-GLOBAL-wsrep_cluster_address" - here is a lot of output showing no connections (other than my working one) while I bounce back and forth between cluster_address':

[root@lab1 pxc]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4, Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 1 | system user | | NULL | Sleep | 13 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 13 | NULL | NULL | 0 | 0 | 1 |
| 3 | event_scheduler | localhost | NULL | Daemon | 13 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name | Value |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.4:1260 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 1 | system user | | NULL | Sleep | 38 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 38 | NULL | NULL | 0 | 0 | 1 |
| 3 | event_scheduler | localhost | NULL | Daemon | 38 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 2 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.01 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 3 | event_scheduler | localhost | NULL | Daemon | 67 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 2 |
| 5 | system user | | NULL | Sleep | 4 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 6 | system user | | NULL | Sleep | 4 | NULL | NULL | 0 | 0 | 1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://1.2.3.4:1260';
Query OK, 0 rows affected (3.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------+
| Variable_name | Value |
+-----------------------+----------+
| wsrep_cluster_address | gcomm:// |
+-----------------------+----------+
1 row in set (0.01 sec)

mysql> set global wsrep_cluster_address='gcomm://1.2.3.5:1270';
Query OK, 0 rows affected (3.01 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name | Value |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.5:1270 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.00 sec)

Now, I try to make another connection - which just sits there:
[root@lab1 workarea]# mysql -u root -p
Enter password:

Flip over to my working connection and I see a new connection stuck in the "login" state:
mysql> show processlist;
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 3 | event_scheduler | localhost | NULL | Daemon | 810 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 2 |
| 13 | system user | | NULL | Sleep | 430 | NULL | NULL | 0 | 0 | 1 |
| 14 | system user | | NULL | Sleep | 430 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 15 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0 | 0 | 1 |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
5 rows in set (0.00 sec)

NOTE - this does not block all new connections by the way - open another window and make another connection and we're OK.
[root@lab1 ~]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 16
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4 , Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 3 | event_scheduler | localhost | NULL | Daemon | 1016 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Sleep | 206 | | NULL | 0 | 0 | 2 |
| 13 | system user | | NULL | Sleep | 636 | NULL | NULL | 0 | 0 | 1 |
| 14 | system user | | NULL | Sleep | 636 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 15 | unauthenticated user | connecting host | NULL | Connect | NULL | login | NULL | 0 | 0 | 1 |
| 16 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
6 rows in set (0.00 sec)

The issue then becomes not being able to do another dynamic setting of the cluster_address:

mysql> set global wsrep_cluster_address='1.2.3.4:1260';

(wsrep_debug is equal to 1 in the cnf file)
120329 23:48:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120329 23:48:27 [Note] WSREP: Assign initial position for certification: 150026, protocol version: 2
120329 23:48:27 [Note] WSREP: Synchronized with group, ready for connections
120329 23:48:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120330 0:00:03 [Note] WSREP: Stop replication
120330 0:00:03 [Note] WSREP: Provider disconnect
120330 0:00:03 [Note] WSREP: Closing send monitor...
120330 0:00:03 [Note] WSREP: Closed send monitor.
120330 0:00:03 [Note] WSREP: gcomm: terminating thread
120330 0:00:03 [Note] WSREP: gcomm: joining thread
120330 0:00:03 [Note] WSREP: gcomm: closing backend
120330 0:00:03 [Note] WSREP: GMCast::handle_stable_view: view((empty))
120330 0:00:03 [Note] WSREP: Received self-leave message.
120330 0:00:03 [Note] WSREP: gcomm: closed
120330 0:00:03 [Note] WSREP: Flow-control interval: [0, 0]
120330 0:00:03 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120330 0:00:03 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 150026)
120330 0:00:03 [Note] WSREP: RECV thread exiting 0: Success
120330 0:00:03 [Note] WSREP: New cluster view: global state: b2a15cb4-7786-11e1-0800-0a71e5c1d29d:150026, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 1
120330 0:00:03 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120330 0:00:03 [Note] WSREP: recv_thread() joined.
120330 0:00:03 [Note] WSREP: Closing slave action queue.
120330 0:00:03 [Note] WSREP: closing connection 15
120330 0:00:03 [Note] WSREP: closing connection 4
120330 0:00:03 [Note] WSREP: applier thread exiting (code:0)
120330 0:00:03 [Note] WSREP: closing applier 13
120330 0:00:05 [Note] WSREP: SST kill local trx: 15
120330 0:00:05 [Note] WSREP: SST kill local trx: 4
120330 0:00:05 [Note] WSREP: waiting for client connections to close: 5

My once working connection is no longer:

mysql> show processlist;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...

Performing a shutdown of the mysql service hangs and eventually times out - only option is to kill -9.

Is there anything I can do to better troubleshoot the hung user connection?

Thanks
Patrick

Alex Yurchenko (ayurchen) wrote :

This concerns only the very first connection. Subsequent connections go through.

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
importance: Undecided → High
milestone: none → 5.5.25-23.7
status: New → Confirmed
summary: - First connection after changeing wsrep_cluster_address hangs
+ First connection after changing wsrep_cluster_address hangs
Changed in codership-mysql:
milestone: 5.5.28-23.7 → 5.5.28-23.8
Changed in percona-xtradb-cluster:
status: New → Confirmed
Changed in codership-mysql:
milestone: 5.5.30-24.8 → 5.5.31-23.7.4
Changed in percona-xtradb-cluster:
milestone: none → 5.5.31-25
Changed in codership-mysql:
milestone: 5.5.31-23.7.5 → 5.5.32-23.7.6
Changed in percona-xtradb-cluster:
milestone: 5.5.33-23.7.6 → future-5.5
Changed in codership-mysql:
milestone: 5.5.33-23.7.6 → 5.5.34-24.9
Changed in codership-mysql:
milestone: 5.5.34-25.9 → 5.5.34-25.10
Changed in codership-mysql:
importance: High → Medium
Seppo Jaakola (seppo-jaakola) wrote :

Tried with latest wsrep-5.5 and wsrep-5.6 builds and could not reproduce the issue anymore

Revisions were as follows:
5.5.35: http://bazaar.launchpad.net/~codership/codership-mysql/wsrep-5.5/revision/3935
5.6.15: http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/4030

Changed in codership-mysql:
status: Confirmed → Fix Released

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1231

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers