Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Setting pc.ignore_quorum or pc.ignore_sb at runtime doesn't work properly

Bug #1028813 reported by Frederic Descamps on 2012-07-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Galera	Fix Released	Undecided	Teemu Ollakka	Galera 23.2.2
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Fix Released	Undecided	Unassigned

Bug Description

It seems that changing pc.ignore_quorum or pc.ignore_sb at runtime works only once and it doesn't show up the change in the variable wsrep_provider_options.

How to reproduce:

2 nodes, galera replication working fine

in my.cnf nothing related to the two wsrep_provider_options above.

on node1:

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3

We can see that pc.ignore_sb = false.

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.01 sec)

So here we have the first problem, I changed the setting, no error or warning returned and when I check the value from the variables it's cleary set to false.

So if my setting was changed, if I break the communication between the two nodes, the nodes should still accept queries, if the setting is the one showed in the variables it should fail.

[root@percona2 ~]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 17 |
+----------+
1 row in set (0.01 sec)

percona1 mysql> insert into percona values (0,'percona1','baron');
Query OK, 1 row affected (7.60 sec)

So the change was done ! And the wrong value was in wsrep_provider_options

Now let's change it again (meanwhile I removed the firewall rule and restarted the second node):

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

Let's stop again the connection:

[root@percona2 mysql]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 18 |
+----------+
1 row in set (0.01 sec)

percona1 mysql> insert into percona values (0,'percona1','fred');
Query OK, 1 row affected (0.01 sec)

So this time the change wasn't taken in consideration !

Let's try again with the change in my.cnf:

wsrep_provider_options = "pc.ignore_sb = true"

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

We can see that pc.ignore_sb = true;

So if I block the communication between the two peers now it should still answer queries:

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.00 sec)

Now restore the communication and add again the second node:

Change the setting during run time:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

No change in wsrep_provider_options.

And stop communication again between nodes:

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

Restore the connection between the nodes.

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.01 sec)

Change for the second time pc.ignore_sb:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)

and stop communication.

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

So the second change didn't work.

Last test:

cluster is started with pc.ignore_sb=true:

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)

Let's change it... twice:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)

So it should be true, let's stop communication:

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

So it's allowed to be changed only once... this is the same for pc.ignore_quorum.

But it's very weird that the values in wsrep_provider_options doesn't reflect the running settings of the cluster.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2012-09-29:

fix committed in lp:galera/2.x r133

Changed in galera:
assignee:	nobody → Teemu Ollakka (teemu-ollakka)
milestone:	none → 23.2.2
status:	New → Fix Committed

Alex Yurchenko (ayurchen) on 2012-10-26

Changed in galera:
status:	Fix Committed → Fix Released

Vadim Tkachenko (vadim-tk) on 2012-12-11

Changed in percona-xtradb-cluster:
status:	New → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-17:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1234

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.