Poor performance on HDD environments (wsrep_slave_threads, tuning-level)

Bug #1822903 reported by James Troup on 2019-04-02
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack percona-cluster charm
Critical
Trent Lloyd

Bug Description

A default charm install of a percona cluster is unusably slow unless your install involves no spinning rust whatsoever and this is not OK.

Trent details the problem in great detail here (Canonical only, sorry):

  https://pastebin.canonical.com/p/nsnjg4qH6h/

In short, I believe we should make the charm implement something like the following logic:

 if ubuntu_release <= xenial:
   if tuning_level == default:
     tuning_level = fast
   if wsrep-slave-threads > 1:
     scream bloody murder into log and status... possibly refuse to start?
 else:
   if wsrep-slave-threads == default:
     wsrep_slave_threads = 48

Changed in charm-percona-cluster:
status: New → Confirmed
importance: Undecided → Critical
Trent Lloyd (lathiat) on 2019-04-05
Changed in charm-percona-cluster:
assignee: nobody → Trent Lloyd (lathiat)
James Page (james-page) on 2019-04-08
Changed in charm-percona-cluster:
status: Confirmed → In Progress
Trent Lloyd (lathiat) wrote :
Download full text (4.7 KiB)

Trent's write-up on the issue (from https://pastebin.canonical.com/p/nsnjg4qH6h/)
======

For Bionic (Percona 5.7) I found that increasing wsrep-slave-threads to 48 results in a reasonably good performance boost, even on HDD storage. The reason is that this allows multiple queries to execute on the slave servers at the same time, just like they do on the master server (as all the queries from the different clients execute at the same time). When this happens, it is also able to merge multiple SQL commits in a small timeframe into a single fsync call (an optimization called 'group commit). This likely leaves a very reduced need to set innodb-tuning-level=fast for Bionic.

Unfortunately for Xenial this is not the case, it seems with the Galera backend it lacks the ability to do any kind of group commit (even though InnoDB itself will do it) - and even worse - I found that for xenial only the slave threads appear to issue 2 fsync calls for every query (as opposed to 1 on the master). The master can only run as fast as the slaves, so even though this technically only affects the slaves, it holds the master back to the same speed though does not submit quite as many fsyncs to the underlying storage on that node. Thus for Xenial the best option is likely to set innodb-tuning-level=fast which removes the fsync calls - definitely on HDD only environments but maybe even on SSD environments as the number of IOPS submitted in a busy cloud could grow quite large since they total double the number of queries per second.

Unfortunately wsrep-slave-threads > 1 also occasionally triggers a bug on Xenial (Percona 5.6) that will likely never be fixed because Percona 5.6 is long end of life upstream. In this case you sometimes (maybe once a day in the environment where we tried it) get a foreign key violation, which causes the slave to exit, restart, and clone fresh from another node. In theory this is not catastrophic since we don't send any queries to the slave thread running servers so production queries shouldn't be impacted.. Although during the SST process I think the server generating the SST does stop responding to queries while the SST is generated (I think? need to double check that is still true). If both slaves crashed out at the same time, that might cause an outage? Would need to check this further. I have not tested whether this same foreign key error happens on Bionic (Percona 5.7) however it is much more likely to have been fixed there as it is a much newer code base.

Secondly it seems that for most new cloud deployments, we are deploying bcache at least for /var (which means the Percona containers are included) which somewhat mitigates the need for tuning-level=fast on Xenial. This may depend slightly on whether or not the bcache sequential threshold has been tuned. I have not tested this but if the sequential threshold is not reduced, and the server is busy, it's likely a large write to the innodb log file could skip the SSD and thus still not get 'cached'.

As to your question about losing 1 seconds worth of transactions. With the 3-node cluster, the transaction is committed to all 3 nodes before returning to the client. For this reason, w...

Read more...

summary: - tuning should be smarter
+ Poor performance on HDD environments (wsrep_slave_threads, tuning-level)

Reviewed: https://review.opendev.org/651127
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=0697559b51666197eecc0985611ee563c9a70c6d
Submitter: Zuul
Branch: master

commit 0697559b51666197eecc0985611ee563c9a70c6d
Author: Trent Lloyd <email address hidden>
Date: Tue Apr 9 14:30:34 2019 +0800

    wsrep_slave_threads: default to 48 on bionic

    This improves performance significantly for environments constrained by
    calls to sync() such as HDDs or lower-end SSDs (or just very busy
    environments running many queries)

    By default the the queries from other nodes are only processed with
    1 thread, which means they will always run slower than on the master and
    any long running query will hold up all other queries behind it.

    Additionally, when multiple queries commit at once the server can
    combine them together into a single on-disk sync ('group commit') which
    is not possible otherwise. This optimisation appears to only occur on
    Bionic (Percona 5.7) and not Xenial (Percona 5.6).

    On Bionic, default to 48 threads which experimentally is a good number
    for OpenStack environments without being too crazy high. Galera ensures
    that queries that are dependent on each other are still executed
    sequentially and generally it is not expected to cause replication
    inconsistencies.

    However Percona Cluster 5.6 on Xenial appears to have a bug handling
    foreign key constraints that causes them to be violated (LP #1823850).
    The result is that the slave node crashes out and has to do a full SST
    to recover. The same issue is not present on the master. Thus we leave
    the default wsrep_slave_threads=1 on Xenial to avoid this issue for now
    particularly since Xenial does not appear to be able to use Group Commit
    to optimise the number of sync requests generated by the queries - so
    this option does not really improve performance there anyway.

    Partial-Bug: #1822903
    Change-Id: Ic9cdd6562f30a3e52aa3d26fea53ba7c2bbdc771

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers