Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

removing node from a cluster via SET GLOBAL wsrep_cluster_address - hangs node being removed

Bug #931713 reported by Patrick Zoblisein on 2012-02-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MySQL patches by Codership	Fix Released	Low	Seppo Jaakola	MySQL patches by Codership 5.5.20-23.4
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Fix Released	Undecided	Unassigned

Bug Description

Hi,

Trying to remove a single node from a 3-node cluster via:

SET GLOBAL wsrep_cluster_address='gcomm://'; (have also tested, with the same result - SET GLOBAL wsrep_cluster_address='';)

using Server version: 5.5.17-22.1-log Percona XtraDB Cluster (GPL), Release 22.1, Revision 3683 wsrep_22.3.r3683

Command hangs the server - command never returns - and the server is no longer accessible - I have to use a kill -9 to get the instance usable again.

From the error log from the node being removed:
120213 22:15:09 [Note] WSREP: Stop replication
120213 22:15:09 [Note] WSREP: Closing send monitor...
120213 22:15:09 [Note] WSREP: Closed send monitor.
120213 22:15:09 [Note] WSREP: gcomm: terminating thread
120213 22:15:09 [Note] WSREP: gcomm: joining thread
120213 22:15:09 [Note] WSREP: gcomm: closing backend
120213 22:15:09 [Note] WSREP: evs::proto(e0c81b4c-567e-11e1-0800-85ed415644d4, LEAVING, view_id(REG,5d6a7cdc-565a-11e1-0800-7dba6589ad90,92)) uuid 5d6a7cdc-565a-11e1-0800-7dba6589ad90 missing from install message, assuming partitioned
120213 22:15:09 [Note] WSREP: evs::proto(e0c81b4c-567e-11e1-0800-85ed415644d4, LEAVING, view_id(REG,5d6a7cdc-565a-11e1-0800-7dba6589ad90,92)) uuid 6cd1e65e-567f-11e1-0800-377d6c6676ab missing from install message, assuming partitioned
120213 22:15:09 [Note] WSREP: GMCast::handle_stable_view: view(view_id(NON_PRIM,5d6a7cdc-565a-11e1-0800-7dba6589ad90,92) memb {
        e0c81b4c-567e-11e1-0800-85ed415644d4,
} joined {
} left {
} partitioned {
        5d6a7cdc-565a-11e1-0800-7dba6589ad90,
        6cd1e65e-567f-11e1-0800-377d6c6676ab,
})
120213 22:15:09 [Note] WSREP: New COMPONENT: primary = no, my_idx = 0, memb_num = 1
120213 22:15:09 [Note] WSREP: GMCast::handle_stable_view: view((empty))
120213 22:15:09 [Note] WSREP: gcomm: closed
120213 22:15:09 [Note] WSREP: Flow-control interval: [230, 256]
120213 22:15:09 [Note] WSREP: Received NON-PRIMARY.
120213 22:15:09 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 4046094)
120213 22:15:09 [Note] WSREP: Received self-leave message.
120213 22:15:09 [Note] WSREP: New cluster view: global state: 0ec148da-46c0-11e1-0800-e762c70be5e7:4046094, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 1
120213 22:15:09 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120213 22:15:09 [Note] WSREP: Flow-control interval: [0, 0]
120213 22:15:09 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120213 22:15:09 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 4046094)
120213 22:15:09 [Note] WSREP: RECV thread exiting 0: Success
120213 22:15:09 [Note] WSREP: recv_thread() joined.
120213 22:15:09 [Note] WSREP: Closing slave action queue.
120213 22:15:09 [Note] WSREP: New cluster view: global state: 0ec148da-46c0-11e1-0800-e762c70be5e7:4046094, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 1
120213 22:15:09 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120213 22:15:09 [Note] WSREP: applier thread exiting (code:0)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:09 [Note] WSREP: applier thread exiting (code:5)
120213 22:15:11 [Note] WSREP: rollbacker thread exiting

Contrast this with a valid gcomm address (which works as expected although it's not removing the node from the cluster):

mysql> set global wsrep_cluster_address='gcomm://12.23.34.45:1260';
Query OK, 0 rows affected (3.51 sec)

Thanks
Patrick

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-02-14:

Trying with lp:codership-mysql (wsrep-5.5) I cannot reproduce this issue. Tried with:

SET GLOBAL wsrep_cluster_address='gcomm://';

and then joining node back with:

SET GLOBAL wsrep_cluster_address='gcomm://127.0.0.1:4567';

and it works both ways with current wsrep-5.5 trunk version.

However, command:
SET GLOBAL wsrep_cluster_address='';

...is rejected

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2012-02-14:

Ok, that means that behaviour is different from 5.1 where

SET GLOBAL wsrep_cluster_address='';

passes and disconnects from cluster.

Seppo Jaakola (seppo-jaakola) on 2012-02-14

Changed in codership-mysql:
status:	New → In Progress
assignee:	nobody → Seppo Jaakola (seppo-jaakola)
milestone:	none → 5.5.20-23.4
importance:	Undecided → Low

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-02-15:

A fix pushed in codership-mysql wsrep-5.5 branch to accept empty wsrep_cluster_address: http://bazaar.launchpad.net/~codership/codership-mysql/wsrep-5.5/revision/3708

Changed in codership-mysql:
status:	In Progress → Fix Committed

Alex Yurchenko (ayurchen) on 2012-02-18

Changed in codership-mysql:
status:	Fix Committed → Fix Released

Revision history for this message

Patrick Zoblisein (patrickzoblisein) wrote on 2012-03-29:

Download full text (12.0 KiB)

Hi - I would like to provide some more information related to the behavior originally described here.

I've been doing some more testing of the `SET GLOBAL wsrep_cluster_address='<>;` functionality and I'm still able to hang my node - regardless of whether it's changing the cluster_address to another node in the cluster, or removing the node from the cluster via gcomm://;

Performing the dynamic change in my lab environment works - no issues as Seppo notes above. It's not until I make a connection to the node, "post-set-GLOBAL-wsrep_cluster_address" - here is a lot of output showing no connections (other than my working one) while I bounce back and forth between cluster_address':

[root@lab1 pxc]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4, Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| 1 | system user | | NULL | Sleep | 13 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 13 | NULL | NULL | 0 | 0 | 1 |
| 3 | event_scheduler | localhost | NULL | Daemon | 13 | Waiting on empty queue | NULL | 0 | 0 | 1 |
| 4 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name | Value |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.4:1260 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

Hi - I would like to provide some more information related to the behavior originally described here.

Performing the dynamic change in my lab environment works - no issues as Seppo notes above.  It's not until I make a connection to the node, "post-set-GLOBAL-wsrep_cluster_address" - here is a lot of output showing no connections (other than my working one) while I bounce back and forth between cluster_address':

[root@lab1 pxc]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4, Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User            | Host      | db   | Command | Time | State                  | Info             | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
|  1 | system user     |           | NULL | Sleep   |   13 | wsrep aborter idle     | NULL             |         0 |             0 |         1 |
|  2 | system user     |           | NULL | Sleep   |   13 | NULL                   | NULL             |         0 |             0 |         1 |
|  3 | event_scheduler | localhost | NULL | Daemon  |   13 | Waiting on empty queue | NULL             |         0 |             0 |         1 |
|  4 | root            | localhost | NULL | Query   |    0 | sleeping               | show processlist |         0 |             0 |         1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name         | Value                      |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.4:1260 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User            | Host      | db   | Command | Time | State                  | Info             | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
|  1 | system user     |           | NULL | Sleep   |   38 | wsrep aborter idle     | NULL             |         0 |             0 |         1 |
|  2 | system user     |           | NULL | Sleep   |   38 | NULL                   | NULL             |         0 |             0 |         1 |
|  3 | event_scheduler | localhost | NULL | Daemon  |   38 | Waiting on empty queue | NULL             |         0 |             0 |         1 |
|  4 | root            | localhost | NULL | Query   |    0 | sleeping               | show processlist |         0 |             0 |         2 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.01 sec)

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User            | Host      | db   | Command | Time | State                  | Info             | Rows_sent | Rows_examined | Rows_read |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
|  3 | event_scheduler | localhost | NULL | Daemon  |   67 | Waiting on empty queue | NULL             |         0 |             0 |         1 |
|  4 | root            | localhost | NULL | Query   |    0 | sleeping               | show processlist |         0 |             0 |         2 |
|  5 | system user     |           | NULL | Sleep   |    4 | wsrep aborter idle     | NULL             |         0 |             0 |         1 |
|  6 | system user     |           | NULL | Sleep   |    4 | NULL                   | NULL             |         0 |             0 |         1 |
+----+-----------------+-----------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://1.2.3.4:1260';
Query OK, 0 rows affected (3.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.00 sec)

mysql> set global wsrep_cluster_address='gcomm://1.2.3.5:1270';
Query OK, 0 rows affected (3.01 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name         | Value                      |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://1.2.3.5:1270 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.00 sec)

Now, I try to make another connection - which just sits there:
[root@lab1 workarea]# mysql -u root -p
Enter password:

Flip over to my working connection and I see a new connection stuck in the "login" state:
mysql> show processlist;
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User                 | Host            | db   | Command | Time | State                  | Info             | Rows_sent | Rows_examined | Rows_read |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
|  3 | event_scheduler      | localhost       | NULL | Daemon  |  810 | Waiting on empty queue | NULL             |         0 |             0 |         1 |
|  4 | root                 | localhost       | NULL | Query   |    0 | sleeping               | show processlist |         0 |             0 |         2 |
| 13 | system user          |                 | NULL | Sleep   |  430 | NULL                   | NULL             |         0 |             0 |         1 |
| 14 | system user          |                 | NULL | Sleep   |  430 | wsrep aborter idle     | NULL             |         0 |             0 |         1 |
| 15 | unauthenticated user | connecting host | NULL | Connect | NULL | login                  | NULL             |         0 |             0 |         1 |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
5 rows in set (0.00 sec)

NOTE - this does not block all new connections by the way - open another window and make another connection and we're OK.
[root@lab1 ~]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 16
Server version: 5.5.20-beta23.4-log Percona XtraDB Cluster (GPL) 5.5.20-beta23.4                                                             , Revision 3724, wsrep_23.4.r3724

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show processlist;
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
| Id | User                 | Host            | db   | Command | Time | State                  | Info             | Rows_sent | Rows_examined | Rows_read |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
|  3 | event_scheduler      | localhost       | NULL | Daemon  | 1016 | Waiting on empty queue | NULL             |         0 |             0 |         1 |
|  4 | root                 | localhost       | NULL | Sleep   |  206 |                        | NULL             |         0 |             0 |         2 |
| 13 | system user          |                 | NULL | Sleep   |  636 | NULL                   | NULL             |         0 |             0 |         1 |
| 14 | system user          |                 | NULL | Sleep   |  636 | wsrep aborter idle     | NULL             |         0 |             0 |         1 |
| 15 | unauthenticated user | connecting host | NULL | Connect | NULL | login                  | NULL             |         0 |             0 |         1 |
| 16 | root                 | localhost       | NULL | Query   |    0 | sleeping               | show processlist |         0 |             0 |         1 |
+----+----------------------+-----------------+------+---------+------+------------------------+------------------+-----------+---------------+-----------+
6 rows in set (0.00 sec)

The issue then becomes not being able to do another dynamic setting of the cluster_address:

mysql> set global wsrep_cluster_address='1.2.3.4:1260';

(wsrep_debug is equal to 1 in the cnf file)
120329 23:48:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120329 23:48:27 [Note] WSREP: Assign initial position for certification: 150026, protocol version: 2
120329 23:48:27 [Note] WSREP: Synchronized with group, ready for connections
120329 23:48:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120330  0:00:03 [Note] WSREP: Stop replication
120330  0:00:03 [Note] WSREP: Provider disconnect
120330  0:00:03 [Note] WSREP: Closing send monitor...
120330  0:00:03 [Note] WSREP: Closed send monitor.
120330  0:00:03 [Note] WSREP: gcomm: terminating thread
120330  0:00:03 [Note] WSREP: gcomm: joining thread
120330  0:00:03 [Note] WSREP: gcomm: closing backend
120330  0:00:03 [Note] WSREP: GMCast::handle_stable_view: view((empty))
120330  0:00:03 [Note] WSREP: Received self-leave message.
120330  0:00:03 [Note] WSREP: gcomm: closed
120330  0:00:03 [Note] WSREP: Flow-control interval: [0, 0]
120330  0:00:03 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120330  0:00:03 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 150026)
120330  0:00:03 [Note] WSREP: RECV thread exiting 0: Success
120330  0:00:03 [Note] WSREP: New cluster view: global state: b2a15cb4-7786-11e1-0800-0a71e5c1d29d:150026, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 1
120330  0:00:03 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120330  0:00:03 [Note] WSREP: recv_thread() joined.
120330  0:00:03 [Note] WSREP: Closing slave action queue.
120330  0:00:03 [Note] WSREP: closing connection 15
120330  0:00:03 [Note] WSREP: closing connection 4
120330  0:00:03 [Note] WSREP: applier thread exiting (code:0)
120330  0:00:03 [Note] WSREP: closing applier 13
120330  0:00:05 [Note] WSREP: SST kill local trx: 15
120330  0:00:05 [Note] WSREP: SST kill local trx: 4
120330  0:00:05 [Note] WSREP: waiting for client connections to close: 5

My once working connection is no longer:

mysql> show processlist;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...

Performing a shutdown of the mysql service hangs and eventually times out - only option is to kill -9.

Is there anything I can do to better troubleshoot the hung user connection?

Thanks
Patrick

Revision history for this message

Henrik Ingo (hingo) wrote on 2012-03-30:

Patrick: I once saw a connection hung in login state like you, although it was not related to changing cluster address (that I know of). This was in the same lab that you are doing, but a pre-0.8 galera OEM vendor. Solution then was to use --skip-name-resolve.

Looking at this bug now, there's probably something deeper going on than just skip-name-resolve, looks buggy. Anyway, just wanted to share.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2012-03-30:

Hi Patrick,

I'm not sure what you're trying to achieve there changing wsrep_cluster_address randomly without regards to node state. Changing wsrep_cluster_address is intended for rational use, it is a human driven event, not a random disaster, and from your report it looks like you were testing cluster resilience to human intervention. :)

Now, wsrep_cluster_address value in itself is of little interest to us, it is a just string. Far more interesting is what this changing it causes in Galera. Wsrep status variables and error log would tell a lot about what is _really_ going on. You should always check cluster and node state before giving it administration command.

I'm not denying that there could be a bug, however 2.1 will see a lot of changes in this department, so I'd suggest to return to this after 2.1 release.

Revision history for this message

Patrick Zoblisein (patrickzoblisein) wrote on 2012-03-30:

Hi Alex,

Randomly changing the address to show that there could be an issue with running the command. I had noticed it once earlier back in February, reported it as a bug, and was told it wasn't reproducible.

But, on the occasion that I used it to remove a node from a cluster during my testing, it was still hosing my node occasionally.

The random changes in the latest test case was simply to show that, yes, the command works but only as long as you don't get a hung connection after doing it.

If the command isn't be used wily-nily without a thought to cluster status - then maybe it should not be allowed to be dynamic? Only allowable via the cnf file and a node restart?

I will revisit the behavior once 2.1 is made available.

Thanks
Patrick

Revision history for this message

Patrick Zoblisein (patrickzoblisein) wrote on 2012-07-06:

Download full text (16.0 KiB)

Hi - have found some time to revisit this using:

mysql> select @@version;
+-----------------+
| @@version |
+-----------------+
| 5.5.24-23.6-log |
+-----------------+

Dynamically changing the gcomm value via: set global wsrep_cluster_address='gcomm://'; no longer hangs and completes as expected.

Only issue remaining (and I don't know if you want a new bug report or if this comment will suffice), is that when dynamically changing the wsrep_cluster_address, the very next connection to the node is hung as an "unauthenticated user". Only way to clear this is to shutdown the node via 'kill -9'.

Node is also using --skip-name-resolve as suggested by Henrik above.

[zoblisei@db001 ~]$ sudo mysql -u dbadm -p -h127.0.0.1 --port=3307
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 373809
Server version: 5.5.24-23.6-log Percona XtraDB Cluster (GPL) 5.5.24-23.6, Revision 342, wsrep_23.6.r342

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show processlist;
+--------+-------------+-----------------+------+---------+--------+--------------------+------------------+-----------+---------------+-----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
+--------+-------------+-----------------+------+---------+--------+--------------------+------------------+-----------+---------------+-----------+
| 1 | system user | | NULL | Sleep | 926519 | wsrep aborter idle | NULL | 0 | 0 | 1 |
| 2 | system user | | NULL | Sleep | 230573 | committed 288 | NULL | 0 | 0 | 1 |
| 373809 | dbadm | 127.0.0.1:44454 | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 1 |
+--------+-------------+-----------------+------+---------+--------+--------------------+------------------+-----------+---------------+-----------+
3 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.01 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+-----...

Hi - have found some time to revisit this using:

mysql> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 5.5.24-23.6-log |
+-----------------+

Dynamically changing the gcomm value via:  set global wsrep_cluster_address='gcomm://'; no longer hangs and completes as expected.

Only issue remaining (and I don't know if you want a new bug report or if this comment will suffice), is that when dynamically changing the wsrep_cluster_address, the very next connection to the node is hung as an "unauthenticated user".  Only way to clear this is to shutdown the node via 'kill -9'.

Node is also using --skip-name-resolve as suggested by Henrik above.

[zoblisei@db001 ~]$ sudo mysql -u dbadm -p -h127.0.0.1 --port=3307
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 373809
Server version: 5.5.24-23.6-log Percona XtraDB Cluster (GPL) 5.5.24-23.6, Revision 342, wsrep_23.6.r342

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show processlist;
+--------+-------------+-----------------+------+---------+--------+--------------------+------------------+-----------+---------------+-----------+
| Id     | User        | Host            | db   | Command | Time   | State              | Info             | Rows_sent | Rows_examined | Rows_read |
+--------+-------------+-----------------+------+---------+--------+--------------------+------------------+-----------+---------------+-----------+
|      1 | system user |                 | NULL | Sleep   | 926519 | wsrep aborter idle | NULL             |         0 |             0 |         1 |
|      2 | system user |                 | NULL | Sleep   | 230573 | committed 288      | NULL             |         0 |             0 |         1 |
| 373809 | dbadm       | 127.0.0.1:44454 | NULL | Query   |      0 | sleeping           | show processlist |         0 |             0 |         1 |
+--------+-------------+-----------------+------+---------+--------+--------------------+------------------+-----------+---------------+-----------+
3 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.01 sec)

mysql> show processlist;
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
| Id     | User                 | Host            | db   | Command | Time | State              | Info             | Rows_sent | Rows_examined | Rows_read |
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
| 373809 | dbadm                | 127.0.0.1:44454 | NULL | Query   |    0 | sleeping           | show processlist |         0 |             0 |         2 |
| 373832 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL             |         0 |             0 |         1 |
| 373833 | system user          |                 | NULL | Sleep   |   18 | NULL               | NULL             |         0 |             0 |         1 |
| 373834 | system user          |                 | NULL | Sleep   |   18 | wsrep aborter idle | NULL             |         0 |             0 |         1 |
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
4 rows in set (0.00 sec)

mysql> set global wsrep_cluster_address='gcomm://db003:4307';
Query OK, 0 rows affected (4.01 sec)

mysql> show processlist;
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
| Id     | User                 | Host            | db   | Command | Time | State              | Info             | Rows_sent | Rows_examined | Rows_read |
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
| 373809 | dbadm                | 127.0.0.1:44454 | NULL | Query   |    0 | sleeping           | show processlist |         0 |             0 |         2 |
| 373832 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL             |         0 |             0 |         1 |
| 373857 | system user          |                 | NULL | Sleep   |   49 | wsrep aborter idle | NULL             |         0 |             0 |         1 |
| 373858 | system user          |                 | NULL | Sleep   |   49 | NULL               | NULL             |         0 |             0 |         1 |
| 373856 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL             |         0 |             0 |         1 |
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
5 rows in set (0.00 sec)

mysql> show processlist;
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
| Id     | User                 | Host            | db   | Command | Time | State              | Info             | Rows_sent | Rows_examined | Rows_read |
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
| 373809 | dbadm                | 127.0.0.1:44454 | NULL | Query   |    0 | sleeping           | show processlist |         0 |             0 |         2 |
| 373832 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL             |         0 |             0 |         1 |
| 373857 | system user          |                 | NULL | Sleep   |   79 | wsrep aborter idle | NULL             |         0 |             0 |         1 |
| 373858 | system user          |                 | NULL | Sleep   |   79 | NULL               | NULL             |         0 |             0 |         1 |
| 373856 | unauthenticated user | connecting host | NULL | Connect | NULL | login              | NULL             |         0 |             0 |         1 |
+--------+----------------------+-----------------+------+---------+------+--------------------+------------------+-----------+---------------+-----------+
5 rows in set (0.00 sec)

mysql> show variables like 'wsrep_cluster_address';
+-----------------------+----------------------------------------------------+
| Variable_name         | Value                                              |
+-----------------------+----------------------------------------------------+
| wsrep_cluster_address | gcomm://db003:4307 |
+-----------------------+----------------------------------------------------+
1 row in set (0.01 sec)

mysql> set global wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (2.31 sec)

mysql> set global wsrep_cluster_address='gcomm://db003:4307';
Query OK, 0 rows affected (2.51 sec)

[root@db001 scripts]# sudo -u mysql /db/mysql/3307/scripts/mysql.server restart
Shutting down MySQL (Percona Server).....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................[FAILED].......................
Failed to stop running server, so refusing to try to start.[FAILED]

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2012-07-08:

Confirmed. Submitted lp:1022250.

Vadim Tkachenko (vadim-tk) on 2012-12-11

Changed in percona-xtradb-cluster:
status:	New → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#10

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1208

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.