Setting pc.ignore_quorum or pc.ignore_sb at runtime doesn't work properly

Bug #1028813 reported by Frederic Descamps
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Fix Released
Undecided
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Undecided
Unassigned

Bug Description

It seems that changing pc.ignore_quorum or pc.ignore_sb at runtime works only once and it doesn't show up the change in the variable wsrep_provider_options.

How to reproduce:

2 nodes, galera replication working fine

in my.cnf nothing related to the two wsrep_provider_options above.

on node1:

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3

We can see that pc.ignore_sb = false.

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.01 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)

So here we have the first problem, I changed the setting, no error or warning returned and when I check the value from the variables it's cleary set to false.

So if my setting was changed, if I break the communication between the two nodes, the nodes should still accept queries, if the setting is the one showed in the variables it should fail.

[root@percona2 ~]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.00 sec)

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 17 |
+----------+
1 row in set (0.01 sec)

percona1 mysql> insert into percona values (0,'percona1','baron');
Query OK, 1 row affected (7.60 sec)

So the change was done ! And the wrong value was in wsrep_provider_options

Now let's change it again (meanwhile I removed the firewall rule and restarted the second node):

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.0.2.15; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)

Let's stop again the connection:

[root@percona2 mysql]# iptables -A INPUT -d 192.168.70.3 -s 192.168.70.2 -j REJECT

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 18 |
+----------+
1 row in set (0.01 sec)

percona1 mysql> insert into percona values (0,'percona1','fred');
Query OK, 1 row affected (0.01 sec)

So this time the change wasn't taken in consideration !

Let's try again with the change in my.cnf:

wsrep_provider_options = "pc.ignore_sb = true"

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

We can see that pc.ignore_sb = true;

So if I block the communication between the two peers now it should still answer queries:

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.00 sec)

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.01 sec)

Now restore the communication and add again the second node:

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)

Change the setting during run time:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.00 sec)

No change in wsrep_provider_options.

And stop communication again between nodes:

| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | e20f9da7-d509-11e1-0800-013f68429ec1 |
| wsrep_cluster_status | non-Primary |
| wsrep_connected | ON |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.1(r113) |
| wsrep_ready | OFF |

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

Restore the connection between the nodes.

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.00 sec)

percona1 mysql> select count(*) from percona;
+----------+
| count(*) |
+----------+
| 19 |
+----------+
1 row in set (0.01 sec)

Change for the second time pc.ignore_sb:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)

and stop communication.

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

So the second change didn't work.

Last test:

cluster is started with pc.ignore_sb=true:

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
1 row in set (0.01 sec)

percona1 mysql> show global variables like 'wsrep_provider_options'\G
*************************** 1. row ***************************
Variable_name: wsrep_provider_options
        Value: base_host = 10.0.2.15; base_port = 4567; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT0.3S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.70.2; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT2S; pc.npvo = false; pc.version = 0; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
1 row in set (0.01 sec)

Let's change it... twice:

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=false";
Query OK, 0 rows affected (0.00 sec)

percona1 mysql> set global wsrep_provider_options="pc.ignore_sb=true";
Query OK, 0 rows affected (0.00 sec)

So it should be true, let's stop communication:

percona1 mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
1 row in set (0.00 sec)

percona1 mysql> select count(*) from percona;
ERROR 1047 (08S01): Unknown command

So it's allowed to be changed only once... this is the same for pc.ignore_quorum.

But it's very weird that the values in wsrep_provider_options doesn't reflect the running settings of the cluster.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

fix committed in lp:galera/2.x r133

Changed in galera:
assignee: nobody → Teemu Ollakka (teemu-ollakka)
milestone: none → 23.2.2
status: New → Fix Committed
Changed in galera:
status: Fix Committed → Fix Released
Changed in percona-xtradb-cluster:
status: New → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1234

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.