It seems that changing pc.ignore_quorum or pc.ignore_sb at runtime works only once and it doesn't show up the change in the variable wsrep_provider_options.
How to reproduce:
2 nodes, galera replication working fine
in my.cnf nothing related to the two wsrep_provider_options above.
on node1:
percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
We can see that pc.ignore_sb = false.
percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.01 sec)
percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)
So here we have the first problem, I changed the setting, no error or warning returned and when I check the value from the variables it's cleary set to false.
So if my setting was changed, if I break the communication between the two nodes, the nodes should still accept queries, if the setting is the one showed in the variables it should fail.
[root@percona2 ~]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.00 sec)
percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 17 |
+----------+
1 row in set (0.01 sec)
percona1 mysql> insert into percona values (0,'percona1','baron');
Query OK, 1 row affected (7.60 sec)
So the change was done ! And the wrong value was in wsrep_provider_options
Now let's change it again (meanwhile I removed the firewall rule and restarted the second node):
percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)
percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)
Let's stop again the connection:
[root@percona2 mysql]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT
percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 18 |
+----------+
1 row in set (0.01 sec)
percona1 mysql> insert into percona values (0,'percona1','fred');
Query OK, 1 row affected (0.01 sec)
So this time the change wasn't taken in consideration !
Let's try again with the change in my.cnf:
wsrep_provider_options = "pc.ignore_sb = true"
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)
percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)
We can see that pc.ignore_sb = true;
So if I block the communication between the two peers now it should still answer queries:
percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.00 sec)
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.01 sec)
Now restore the communication and add again the second node:
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)
Change the setting during run time:
percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)
percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)
No change in wsrep_provider_options.
And stop communication again between nodes:
| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | e20f9da7-d509-11e1-0800-013f68429ec1 |
| wsrep_cluster_status | non-Primary |
| wsrep_connected | ON |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.1(r113) |
| wsrep_ready | OFF |
percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command
Restore the connection between the nodes.
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)
percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.01 sec)
Change for the second time pc.ignore_sb:
percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)
and stop communication.
percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command
So the second change didn't work.
Last test:
cluster is started with pc.ignore_sb=true:
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)
percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)
Let's change it... twice:
percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)
percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)
So it should be true, let's stop communication:
percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.00 sec)
percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command
So it's allowed to be changed only once... this is the same for pc.ignore_quorum.
But it's very weird that the values in wsrep_provider_options doesn't reflect the running settings of the cluster.
fix committed in lp:galera/2.x r133