Percona XtraDB Cluster - HA scalable solution for MySQL

In geo-DR setup using garbd, performance degrades with node count when cross dc link is down

Reported by Yves Trudeau on 2013-05-22
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
High
Teemu Ollakka
Percona XtraDB Cluster
Undecided
Unassigned

Bug Description

I build the following setup to test the message relaying capability of garbd

PXC PXC
DC1 ------- DC2
   \ /
     \ /
        DC3
      garbd

DC1: subnet: 10.3.1.0/24
DC2: subnet: 10.3.2.0/24
DC3: subnet: 10.3.3.0/24

Everything is built in virtualbox and sit on my local network. Ping time between all the hosts in under 1ms.

Then, I cut communication between DC1 and DC2 using on the nodes in DC1:

iptables -I INPUT -s 10.3.2.0/24 -j DROP; iptables -I OUTPUT -d 10.3.2.0/24 -j DROP

The issue is strong performance regression as nodes are added:

=========================================
1 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.02 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (0.01 sec)

This is perfect and works very well. But adding a 2nd node in each DC...

=========================================
2 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (7.02 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (3.16 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (3.24 sec)

And a 3rd...

=========================================
3 nodes on each side + garb, cross dc cut
=========================================

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (19.59 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (10.68 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (23.42 sec)

So performance degrades with node counts. Here's a status output from the inserter node, all status are similar:

+----------------------------+--------------------------------------------------------------------------------------------+
| Variable_name | Value |
+----------------------------+--------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 477db327-c25d-11e2-0800-c23a22fb15b8 |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 28 |
| wsrep_replicated | 27 |
| wsrep_replicated_bytes | 4205 |
| wsrep_received | 39 |
| wsrep_received_bytes | 7498 |
| wsrep_local_commits | 23 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 4.500000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 11 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 10.3.1.43:3306,10.3.1.42:3306,,10.3.1.41:3306,10.3.2.42:3306,10.3.2.41:3306,10.3.2.43:3306 |
| wsrep_cluster_conf_id | 26 |
| wsrep_cluster_size | 7 |
| wsrep_cluster_state_uuid | 477db327-c25d-11e2-0800-c23a22fb15b8 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 3 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <email address hidden> |
| wsrep_provider_version | 2.5(r150) |
| wsrep_ready | ON |

and the variables:

wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_data_home_dir=/var/lib/mysql
wsrep_slave_threads=8
wsrep_node_name=geo-test-1-1
wsrep_sst_auth=root:pass
wsrep_cluster_name=pxc_geo_test
wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4010; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S;
evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.send_window = 64; evs.user_send_window = 32; evs.consensus_timeout = PT1M; evs.debug_log_mask=EVS
_DEBUG"
wsrep_node_address=10.3.1.41
#wsrep_cluster_address = "gcomm://10.3.3.42:4010"
wsrep_cluster_address = "gcomm://" #First node for now

and finally, on 10.3.3.41, garbd is invoked with:

garbd -a gcomm://10.3.1.41:4010 -o "gmcast.listen_addr=tcp://0.0.0.0:4010; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.send_window = 64; evs.user_send_window = 32; evs.consensus_timeout = PT1M; evs.debug_log_mask=EVS_DEBUG" -g pxc_geo_test -l /tmp/1.out -d

The timing setting comes from a customer that has the same issue (even worse) on a real wan.

Yves

Related branches

To rule out the garbd code, I have done another test and replaced the garbd with a real mysql/xtradb node in DC3.
The behaviour was the same. Performance gets's worse with more then 1 node on one side, DC1 or DC2.

Marco

Yves Trudeau (y-trudeau) wrote :

changing evs.keepalive_period to PT1S and letting evs.inactive_check_period go back to default result in better times although still way too long:

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (4.71 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (8.54 sec)

mysql> insert into t (id) values (NULL);
Query OK, 1 row affected (4.50 sec)

For 3 node on each side + garb

Alex Yurchenko (ayurchen) wrote :

Hi Yves, could you please post error logs: one from garbd (syslog) and one from each of datacenters?

I'm attaching the logs from my system.
There are no errors, except that it tries to re-connect.

3:30 iptables enabled
now a little querying and trying to insert into a test table
3:36 iptables removed

Cluster size was also all the time 9. No node dropped out of the cluster.
Below is a status snapshot from one of the nodes.

date; mysql55 mysqlxdb20s -e "show global status like 'wsrep_%';"
Mon May 27 09:34:03 CEST 2013

wsrep_local_state_uuid | 23ce172b-31c5-11e2-0800-b3f49bb7dd1f
wsrep_protocol_version | 4
wsrep_last_committed | 1835663
wsrep_replicated | 3877
wsrep_replicated_bytes | 4082149
wsrep_received | 329716
wsrep_received_bytes | 1064015606
wsrep_local_commits | 3875
wsrep_local_cert_failures | 0
wsrep_local_bf_aborts | 2
wsrep_local_replays | 0
wsrep_local_send_queue | 0
wsrep_local_send_queue_avg | 0.000000
wsrep_local_recv_queue | 0
wsrep_local_recv_queue_avg | 0.000000
wsrep_flow_control_paused | 0.000000
wsrep_flow_control_sent | 0
wsrep_flow_control_recv | 0
wsrep_cert_deps_distance | 220.757988
wsrep_apply_oooe | 0.000000
wsrep_apply_oool | 0.000000
wsrep_apply_window | 0.000000
wsrep_commit_oooe | 0.000000
wsrep_commit_oool | 0.000000
wsrep_commit_window | 0.000000
wsrep_local_state | 4
wsrep_local_state_comment | Synced
wsrep_cert_index_size | 24845
wsrep_causal_reads | 0
wsrep_incoming_addresses | 10.10.1.123:3306,10.20.1.134:3306,10.20.1.135:3306,10.20.1.133:3306,,10.10.1.125:3306,10.20.1.190:3306,10.10.1.122:3306,10.10.1.124:3306
wsrep_cluster_conf_id | 55
wsrep_cluster_size | 9
wsrep_cluster_state_uuid | 23ce172b-31c5-11e2-0800-b3f49bb7dd1f
wsrep_cluster_status | Primary
wsrep_connected | ON
wsrep_local_index | 3
wsrep_provider_name | Galera
wsrep_provider_vendor | Codership Oy <email address hidden>
wsrep_provider_version | 2.5(r150)
wsrep_ready | ON

Changed in galera:
status: New → Confirmed
assignee: nobody → Teemu Ollakka (teemu-ollakka)
milestone: none → 23.2.6
Changed in galera:
importance: Undecided → High
status: Confirmed → In Progress
Changed in galera:
milestone: 23.2.6 → 23.2.7
Alex Yurchenko (ayurchen) wrote :
Changed in galera:
status: In Progress → Fix Committed
Changed in percona-xtradb-cluster:
milestone: none → 5.5.33-23.7.6
status: New → Fix Committed
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Changed in galera:
status: Fix Committed → Fix Released

We finally had a chance to upgrade our xtradb production cluster last week.
All our tests on dev,stage and production look good so far.

This is a 9 node cluster. 4 in the two main datacenters plus the arbiter in a third datacenter.

Thanks for the fix.
Marco

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments