Percona XtraDB Cluster - HA scalable solution for MySQL

Setting pc.ignore_quorum or pc.ignore_sb at runtime doesn't work properly

Reported by Frederic Descamps on 2012-07-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Undecided
Teemu Ollakka
Percona XtraDB Cluster
Undecided
Unassigned

Bug Description

It seems that changing pc.ignore_quorum or pc.ignore_sb at runtime works only once and it doesn't show up the change in the variable wsrep_provider_options.

How to reproduce:

2 nodes, galera replication working fine

in my.cnf nothing related to the two wsrep_provider_options above.

on node1:

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3

We can see that pc.ignore_sb = false.

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.01 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)

So here we have the first problem, I changed the setting, no error or warning returned and when I check the value from the variables it's cleary set to false.

So if my setting was changed, if I break the communication between the two nodes, the nodes should still accept queries, if the setting is the one showed in the variables it should fail.

[root@percona2 ~]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.00 sec)

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 17 |
+----------+
1 row in set (0.01 sec)

percona1 mysql> insert into percona values (0,'percona1','baron');
Query OK, 1 row affected (7.60 sec)

So the change was done ! And the wrong value was in wsrep_provider_options

Now let's change it again (meanwhile I removed the firewall rule and restarted the second node):

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)

Let's stop again the connection:

[root@percona2 mysql]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 18 |
+----------+
1 row in set (0.01 sec)

percona1 mysql> insert into percona values (0,'percona1','fred');
Query OK, 1 row affected (0.01 sec)

So this time the change wasn't taken in consideration !

Let's try again with the change in my.cnf:

wsrep_provider_options = "pc.ignore_sb = true"

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

We can see that pc.ignore_sb = true;

So if I block the communication between the two peers now it should still answer queries:

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.00 sec)

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.01 sec)

Now restore the communication and add again the second node:

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)

Change the setting during run time:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

No change in wsrep_provider_options.

And stop communication again between nodes:

| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | e20f9da7-d509-11e1-0800-013f68429ec1 |
| wsrep_cluster_status | non-Primary |
| wsrep_connected | ON |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.1(r113) |
| wsrep_ready | OFF |

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

Restore the connection between the nodes.

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.01 sec)

Change for the second time pc.ignore_sb:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)

and stop communication.

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

So the second change didn't work.

Last test:

cluster is started with pc.ignore_sb=true:

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)

Let's change it... twice:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)

So it should be true, let's stop communication:

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.00 sec)

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

So it's allowed to be changed only once... this is the same for pc.ignore_quorum.

But it's very weird that the values in wsrep_provider_options doesn't reflect the running settings of the cluster.

Alex Yurchenko (ayurchen) wrote :

fix committed in lp:galera/2.x r133

Changed in galera:
assignee: nobody → Teemu Ollakka (teemu-ollakka)
milestone: none → 23.2.2
status: New → Fix Committed
Changed in galera:
status: Fix Committed → Fix Released
Changed in percona-xtradb-cluster:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers