Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

In geo-DR setup using garbd, performance degrades with node count when cross dc link is down

Bug #1182960 reported by Yves Trudeau on 2013-05-22

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Galera	Fix Released	High	Teemu Ollakka	Galera 23.2.7
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.33-23.7.6

Bug Description

I build the following setup to test the message relaying capability of garbd

PXC PXC
DC1 ------- DC2
   \ /
     \ /
        DC3
      garbd

DC1: subnet: 10.3.1.0/24
DC2: subnet: 10.3.2.0/24
DC3: subnet: 10.3.3.0/24

Everything is built in virtualbox and sit on my local network. Ping time between all the hosts in under 1ms.

Then, I cut communication between DC1 and DC2 using on the nodes in DC1:

iptables -I INPUT -s 10.3.2.0/24 -j DROP; iptables -I OUTPUT -d 10.3.2.0/24 -j DROP

The issue is strong performance regression as nodes are added:

=========================================
1 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.02 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.01 sec)

This is perfect and works very well. But adding a 2nd node in each DC...

=========================================
2 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (7.02 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (3.16 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (3.24 sec)

And a 3rd...

=========================================
3 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (19.59 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (10.68 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (23.42 sec)

So performance degrades with node counts. Here's a status output from the inserter node, all status are similar:

+----------------------------+--------------------------------------------------------------------------------------------+
| Variable_name | Value |
+----------------------------+--------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 477db327-c25d-11e2-0800-c23a22fb15b8 |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 28 |
| wsrep_replicated | 27 |
| wsrep_replicated_bytes | 4205 |
| wsrep_received | 39 |
| wsrep_received_bytes | 7498 |
| wsrep_local_commits | 23 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 4.500000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 11 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 10.3.1.43:3306,10.3.1.42:3306,,10.3.1.41:3306,10.3.2.42:3306,10.3.2.41:3306,10.3.2.43:3306 |
| wsrep_cluster_conf_id | 26 |
| wsrep_cluster_size | 7 |
| wsrep_cluster_state_uuid | 477db327-c25d-11e2-0800-c23a22fb15b8 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 3 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.5(r150) |
| wsrep_ready | ON |

and the variables:

wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_data_home_dir=/var/lib/mysql
wsrep_slave_threads=8
wsrep_node_name=geo-test-1-1
wsrep_sst_auth=root:pass
wsrep_cluster_name=pxc_geo_test
wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4010; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S;
evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.send_window = 64; evs.user_send_window = 32; evs.consensus_timeout = PT1M; evs.debug_log_mask=EVS
_DEBUG"
wsrep_node_address=10.3.1.41
#wsrep_cluster_address = "gcomm://10.3.3.42:4010"
wsrep_cluster_address = "gcomm://" #First node for now

and finally, on 10.3.3.41, garbd is invoked with:

garbd -a gcomm://10.3.1.41:4010 -o "gmcast.listen_addr=tcp://0.0.0.0:4010; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.send_window = 64; evs.user_send_window = 32; evs.consensus_timeout = PT1M; evs.debug_log_mask=EVS_DEBUG" -g pxc_geo_test -l /tmp/1.out -d

The timing setting comes from a customer that has the same issue (even worse) on a real wan.

Yves

Related branches

lp:galera/2.x

Revision history for this message

Marco Schirrmeister (marcoschirrmeister) wrote on 2013-05-22:

To rule out the garbd code, I have done another test and replaced the garbd with a real mysql/xtradb node in DC3.
The behaviour was the same. Performance gets's worse with more then 1 node on one side, DC1 or DC2.

Marco

Revision history for this message

Yves Trudeau (y-trudeau) wrote on 2013-05-23:

changing evs.keepalive_period to PT1S and letting evs.inactive_check_period go back to default result in better times although still way too long:

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (4.71 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (8.54 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (4.50 sec)

For 3 node on each side + garb

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-05-24:

Hi Yves, could you please post error logs: one from garbd (syslog) and one from each of datacenters?

Revision history for this message

Marco Schirrmeister (marcoschirrmeister) wrote on 2013-05-27:

mysql_logs.zip Edit (17.8 KiB, application/zip)

I'm attaching the logs from my system.
There are no errors, except that it tries to re-connect.

3:30 iptables enabled
now a little querying and trying to insert into a test table
3:36 iptables removed

Cluster size was also all the time 9. No node dropped out of the cluster.
Below is a status snapshot from one of the nodes.

date; mysql55 mysqlxdb20s -e "show global status like 'wsrep_%';"
Mon May 27 09:34:03 CEST 2013

I'm attaching the logs from my system.
There are no errors, except that it tries to re-connect.

3:30 iptables enabled
now a little querying and trying to insert into a test table
3:36 iptables removed

Cluster size was also all the time 9. No node dropped out of the cluster.
Below is a status snapshot from one of the nodes.

date; mysql55 mysqlxdb20s -e "show global status like 'wsrep_%';"
Mon May 27 09:34:03 CEST 2013

Teemu Ollakka (teemu-ollakka) on 2013-06-04

Changed in galera:
status:	New → Confirmed
assignee:	nobody → Teemu Ollakka (teemu-ollakka)
milestone:	none → 23.2.6

Teemu Ollakka (teemu-ollakka) on 2013-06-13

Changed in galera:
importance:	Undecided → High
status:	Confirmed → In Progress

Alex Yurchenko (ayurchen) on 2013-06-28

Changed in galera:
milestone:	23.2.6 → 23.2.7

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-08-23:

fix pushed in http://bazaar.launchpad.net/~codership/galera/2.x/revision/153

Changed in galera:
status:	In Progress → Fix Committed

Raghavendra D Prabhu (raghavendra-prabhu) on 2013-08-29

Changed in percona-xtradb-cluster:
milestone:	none → 5.5.33-23.7.6
status:	New → Fix Committed

Raghavendra D Prabhu (raghavendra-prabhu) on 2013-09-23

Changed in percona-xtradb-cluster:
status:	Fix Committed → Fix Released

Alex Yurchenko (ayurchen) on 2013-12-01

Changed in galera:
status:	Fix Committed → Fix Released

Revision history for this message

Marco Schirrmeister (marcoschirrmeister) wrote on 2013-12-02:

We finally had a chance to upgrade our xtradb production cluster last week.
All our tests on dev,stage and production look good so far.

This is a 9 node cluster. 4 in the two main datacenters plus the arbiter in a third datacenter.

Thanks for the fix.
Marco

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1355

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

mysql_logs.zip Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.