In geo-DR setup using garbd, performance degrades with node count when cross dc link is down

Bug #1182960 reported by Yves Trudeau
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Fix Released
High
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Undecided
Unassigned

Bug Description

I build the following setup to test the message relaying capability of garbd

PXC PXC
DC1 ------- DC2
   \ /
     \ /
        DC3
      garbd

DC1: subnet: 10.3.1.0/24
DC2: subnet: 10.3.2.0/24
DC3: subnet: 10.3.3.0/24

Everything is built in virtualbox and sit on my local network. Ping time between all the hosts in under 1ms.

Then, I cut communication between DC1 and DC2 using on the nodes in DC1:

iptables -I INPUT -s 10.3.2.0/24 -j DROP; iptables -I OUTPUT -d 10.3.2.0/24 -j DROP

The issue is strong performance regression as nodes are added:

=========================================
1 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.02 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.01 sec)

This is perfect and works very well. But adding a 2nd node in each DC...

=========================================
2 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (7.02 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (3.16 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (3.24 sec)

And a 3rd...

=========================================
3 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (19.59 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (10.68 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (23.42 sec)

So performance degrades with node counts. Here's a status output from the inserter node, all status are similar:

+----------------------------+--------------------------------------------------------------------------------------------+
| Variable_name | Value |
+----------------------------+--------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 477db327-c25d-11e2-0800-c23a22fb15b8 |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 28 |
| wsrep_replicated | 27 |
| wsrep_replicated_bytes | 4205 |
| wsrep_received | 39 |
| wsrep_received_bytes | 7498 |
| wsrep_local_commits | 23 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 4.500000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 11 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 10.3.1.43:3306,10.3.1.42:3306,,10.3.1.41:3306,10.3.2.42:3306,10.3.2.41:3306,10.3.2.43:3306 |
| wsrep_cluster_conf_id | 26 |
| wsrep_cluster_size | 7 |
| wsrep_cluster_state_uuid | 477db327-c25d-11e2-0800-c23a22fb15b8 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 3 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.5(r150) |
| wsrep_ready | ON |

and the variables:

wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_data_home_dir=/var/lib/mysql
wsrep_slave_threads=8
wsrep_node_name=geo-test-1-1
wsrep_sst_auth=root:pass
wsrep_cluster_name=pxc_geo_test
wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4010; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S;
evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.send_window = 64; evs.user_send_window = 32; evs.consensus_timeout = PT1M; evs.debug_log_mask=EVS
_DEBUG"
wsrep_node_address=10.3.1.41
#wsrep_cluster_address = "gcomm://10.3.3.42:4010"
wsrep_cluster_address = "gcomm://" #First node for now

and finally, on 10.3.3.41, garbd is invoked with:

garbd -a gcomm://10.3.1.41:4010 -o "gmcast.listen_addr=tcp://0.0.0.0:4010; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.send_window = 64; evs.user_send_window = 32; evs.consensus_timeout = PT1M; evs.debug_log_mask=EVS_DEBUG" -g pxc_geo_test -l /tmp/1.out -d

The timing setting comes from a customer that has the same issue (even worse) on a real wan.

Yves

Related branches

Revision history for this message
Marco Schirrmeister (marcoschirrmeister) wrote :

To rule out the garbd code, I have done another test and replaced the garbd with a real mysql/xtradb node in DC3.
The behaviour was the same. Performance gets's worse with more then 1 node on one side, DC1 or DC2.

Marco

Revision history for this message
Yves Trudeau (y-trudeau) wrote :

changing evs.keepalive_period to PT1S and letting evs.inactive_check_period go back to default result in better times although still way too long:

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (4.71 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (8.54 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (4.50 sec)

For 3 node on each side + garb

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Hi Yves, could you please post error logs: one from garbd (syslog) and one from each of datacenters?

Revision history for this message
Marco Schirrmeister (marcoschirrmeister) wrote :

I'm attaching the logs from my system.
There are no errors, except that it tries to re-connect.

3:30 iptables enabled
now a little querying and trying to insert into a test table
3:36 iptables removed

Cluster size was also all the time 9. No node dropped out of the cluster.
Below is a status snapshot from one of the nodes.

date; mysql55 mysqlxdb20s -e "show global status like 'wsrep_%';"
Mon May 27 09:34:03 CEST 2013

wsrep_local_state_uuid | 23ce172b-31c5-11e2-0800-b3f49bb7dd1f
wsrep_protocol_version | 4
wsrep_last_committed | 1835663
wsrep_replicated | 3877
wsrep_replicated_bytes | 4082149
wsrep_received | 329716
wsrep_received_bytes | 1064015606
wsrep_local_commits | 3875
wsrep_local_cert_failures | 0
wsrep_local_bf_aborts | 2
wsrep_local_replays | 0
wsrep_local_send_queue | 0
wsrep_local_send_queue_avg | 0.000000
wsrep_local_recv_queue | 0
wsrep_local_recv_queue_avg | 0.000000
wsrep_flow_control_paused | 0.000000
wsrep_flow_control_sent | 0
wsrep_flow_control_recv | 0
wsrep_cert_deps_distance | 220.757988
wsrep_apply_oooe | 0.000000
wsrep_apply_oool | 0.000000
wsrep_apply_window | 0.000000
wsrep_commit_oooe | 0.000000
wsrep_commit_oool | 0.000000
wsrep_commit_window | 0.000000
wsrep_local_state | 4
wsrep_local_state_comment | Synced
wsrep_cert_index_size | 24845
wsrep_causal_reads | 0
wsrep_incoming_addresses | 10.10.1.123:3306,10.20.1.134:3306,10.20.1.135:3306,10.20.1.133:3306,,10.10.1.125:3306,10.20.1.190:3306,10.10.1.122:3306,10.10.1.124:3306
wsrep_cluster_conf_id | 55
wsrep_cluster_size | 9
wsrep_cluster_state_uuid | 23ce172b-31c5-11e2-0800-b3f49bb7dd1f
wsrep_cluster_status | Primary
wsrep_connected | ON
wsrep_local_index | 3
wsrep_provider_name | Galera
wsrep_provider_vendor | Codership Oy <email address hidden>
wsrep_provider_version | 2.5(r150)
wsrep_ready | ON

Changed in galera:
status: New → Confirmed
assignee: nobody → Teemu Ollakka (teemu-ollakka)
milestone: none → 23.2.6
Changed in galera:
importance: Undecided → High
status: Confirmed → In Progress
Changed in galera:
milestone: 23.2.6 → 23.2.7
Revision history for this message
Alex Yurchenko (ayurchen) wrote :
Changed in galera:
status: In Progress → Fix Committed
Changed in percona-xtradb-cluster:
milestone: none → 5.5.33-23.7.6
status: New → Fix Committed
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Changed in galera:
status: Fix Committed → Fix Released
Revision history for this message
Marco Schirrmeister (marcoschirrmeister) wrote :

We finally had a chance to upgrade our xtradb production cluster last week.
All our tests on dev,stage and production look good so far.

This is a 9 node cluster. 4 in the two main datacenters plus the arbiter in a third datacenter.

Thanks for the fix.
Marco

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1355

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.