Bug #1822903 “Poor performance on HDD environments (wsrep_slave_...” : Bugs : OpenStack Percona Cluster Charm

Billy Olsen (billy-olsen) on 2019-04-02

Changed in charm-percona-cluster:
status:	New → Confirmed
importance:	Undecided → Critical

Trent Lloyd (lathiat) on 2019-04-05

Changed in charm-percona-cluster:
assignee:	nobody → Trent Lloyd (lathiat)

James Page (james-page) on 2019-04-08

Changed in charm-percona-cluster:
status:	Confirmed → In Progress

Revision history for this message

Trent Lloyd (lathiat) wrote on 2019-04-09:

#1

Download full text (4.7 KiB)

Trent's write-up on the issue (from https://pastebin.canonical.com/p/nsnjg4qH6h/)
======

For Bionic (Percona 5.7) I found that increasing wsrep-slave-threads to 48 results in a reasonably good performance boost, even on HDD storage. The reason is that this allows multiple queries to execute on the slave servers at the same time, just like they do on the master server (as all the queries from the different clients execute at the same time). When this happens, it is also able to merge multiple SQL commits in a small timeframe into a single fsync call (an optimization called 'group commit). This likely leaves a very reduced need to set innodb-tuning-level=fast for Bionic.

Unfortunately for Xenial this is not the case, it seems with the Galera backend it lacks the ability to do any kind of group commit (even though InnoDB itself will do it) - and even worse - I found that for xenial only the slave threads appear to issue 2 fsync calls for every query (as opposed to 1 on the master). The master can only run as fast as the slaves, so even though this technically only affects the slaves, it holds the master back to the same speed though does not submit quite as many fsyncs to the underlying storage on that node. Thus for Xenial the best option is likely to set innodb-tuning-level=fast which removes the fsync calls - definitely on HDD only environments but maybe even on SSD environments as the number of IOPS submitted in a busy cloud could grow quite large since they total double the number of queries per second.

Unfortunately wsrep-slave-threads > 1 also occasionally triggers a bug on Xenial (Percona 5.6) that will likely never be fixed because Percona 5.6 is long end of life upstream. In this case you sometimes (maybe once a day in the environment where we tried it) get a foreign key violation, which causes the slave to exit, restart, and clone fresh from another node. In theory this is not catastrophic since we don't send any queries to the slave thread running servers so production queries shouldn't be impacted.. Although during the SST process I think the server generating the SST does stop responding to queries while the SST is generated (I think? need to double check that is still true). If both slaves crashed out at the same time, that might cause an outage? Would need to check this further. I have not tested whether this same foreign key error happens on Bionic (Percona 5.7) however it is much more likely to have been fixed there as it is a much newer code base.

Secondly it seems that for most new cloud deployments, we are deploying bcache at least for /var (which means the Percona containers are included) which somewhat mitigates the need for tuning-level=fast on Xenial. This may depend slightly on whether or not the bcache sequential threshold has been tuned. I have not tested this but if the sequential threshold is not reduced, and the server is busy, it's likely a large write to the innodb log file could skip the SSD and thus still not get 'cached'.

As to your question about losing 1 seconds worth of transactions. With the 3-node cluster, the transaction is committed to all 3 nodes before returning to the client. For this reason, w...

Trent's write-up on the issue (from https://pastebin.canonical.com/p/nsnjg4qH6h/)
======

For Bionic (Percona 5.7) I found that increasing wsrep-slave-threads to 48 results in a reasonably good performance boost, even on HDD storage. The reason is that this allows multiple queries to execute on the slave servers at the same time, just like they do on the master server (as all the queries from the different clients execute at the same time). When this happens, it is also able to merge multiple SQL commits in a small timeframe into a single fsync call (an optimization called 'group commit). This likely leaves a very reduced need to set innodb-tuning-level=fast for Bionic.

Unfortunately for Xenial this is not the case, it seems with the Galera backend it lacks the ability to do any kind of group commit (even though InnoDB itself will do it) - and even worse - I found that for xenial only the slave threads appear to issue 2 fsync calls for every query (as opposed to 1 on the master). The master can only run as fast as the slaves, so even though this technically only affects the slaves, it holds the master back to the same speed though does not submit quite as many fsyncs to the underlying storage on that node. Thus for Xenial the best option is likely to set innodb-tuning-level=fast which removes the fsync calls - definitely on HDD only environments but maybe even on SSD environments as the number of IOPS submitted in a busy cloud could grow quite large since they total double the number of queries per second.

Unfortunately wsrep-slave-threads > 1 also occasionally triggers a bug on Xenial (Percona 5.6) that will likely never be fixed because Percona 5.6 is long end of life upstream. In this case you sometimes (maybe once a day in the environment where we tried it) get a foreign key violation, which causes the slave to exit, restart, and clone fresh from another node. In theory this is not catastrophic since we don't send any queries to the slave thread running servers so production queries shouldn't be impacted.. Although during the SST process I think the server generating the SST does stop responding to queries while the SST is generated (I think? need to double check that is still true). If both slaves crashed out at the same time, that might cause an outage? Would need to check this further. I have not tested whether this same foreign key error happens on Bionic (Percona 5.7) however it is much more likely to have been fixed there as it is a much newer code base.

Secondly it seems that for most new cloud deployments, we are deploying bcache at least for /var (which means the Percona containers are included) which somewhat mitigates the need for tuning-level=fast on Xenial. This may depend slightly on whether or not the bcache sequential threshold has been tuned. I have not tested this but if the sequential threshold is not reduced, and the server is busy, it's likely a large write to the innodb log file could skip the SSD and thus still not get 'cached'.

As to your question about losing 1 seconds worth of transactions. With the 3-node cluster, the transaction is committed to all 3 nodes before returning to the client. For this reason, we are protected from a server crash/power loss on any single server, however, you will not be protected in the event of a total data center / rack power outage (which I have actually seen multiple customer cases for, though, is not exceedingly common). How would that affect OpenStack if it does happen? Honestly it's a bit of a luck draw, if you're lucky it might not matter at all, if you're less unlucky some resource that was created might get lost but if you are quite unlucky an actual inconsistent write could be done to the DB that has a longer lasting effect and perhaps stopping some resource from working, etc.. it's one of those things that will "probably" be fine, might cause a specific server/resource to get corrupted that could be fixed or at worst deleted. But there is a small chance something more problematic could happen.

Hence my suggestion would be to:
 - Set wsrep-slave-threads=48 for Bionic, but wsrep-slave-threads=1 for Xenial
 - Set innodb-tuning-level=fast only on Xenial, leave innodb-tuning-level=safest for Bionic
 - If we do have a HDD-only non-bcache environment on Bionic, you could perhaps still consider innodb-tuning-level=fast - group commit helps a lot so it is much better than Xenial but you will still have a fundamental limit. But honestly, I just would not recommend this. HDD-only environments that also do not have a battery backed RAID controller cache (which because of the ceph configuration we generally suggest not to have, from my understanding) are just not going to perform in lots of different areas.

summary:

- tuning should be smarter
+ Poor performance on HDD environments (wsrep_slave_threads, tuning-level)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-09: Fix proposed to charm-percona-cluster (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/651127

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-26: Fix merged to charm-percona-cluster (master)

#3

Reviewed: https://review.opendev.org/651127
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=0697559b51666197eecc0985611ee563c9a70c6d
Submitter: Zuul
Branch: master

commit 0697559b51666197eecc0985611ee563c9a70c6d
Author: Trent Lloyd <email address hidden>
Date: Tue Apr 9 14:30:34 2019 +0800

wsrep_slave_threads: default to 48 on bionic

    This improves performance significantly for environments constrained by
    calls to sync() such as HDDs or lower-end SSDs (or just very busy
    environments running many queries)

    By default the the queries from other nodes are only processed with
    1 thread, which means they will always run slower than on the master and
    any long running query will hold up all other queries behind it.

    Additionally, when multiple queries commit at once the server can
    combine them together into a single on-disk sync ('group commit') which
    is not possible otherwise. This optimisation appears to only occur on
    Bionic (Percona 5.7) and not Xenial (Percona 5.6).

    On Bionic, default to 48 threads which experimentally is a good number
    for OpenStack environments without being too crazy high. Galera ensures
    that queries that are dependent on each other are still executed
    sequentially and generally it is not expected to cause replication
    inconsistencies.

    However Percona Cluster 5.6 on Xenial appears to have a bug handling
    foreign key constraints that causes them to be violated (LP #1823850).
    The result is that the slave node crashes out and has to do a full SST
    to recover. The same issue is not present on the master. Thus we leave
    the default wsrep_slave_threads=1 on Xenial to avoid this issue for now
    particularly since Xenial does not appear to be able to use Group Commit
    to optimise the number of sync requests generated by the queries - so
    this option does not really improve performance there anyway.

Partial-Bug: #1822903
Change-Id: Ic9cdd6562f30a3e52aa3d26fea53ba7c2bbdc771

Reviewed:  https://review.opendev.org/651127
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=0697559b51666197eecc0985611ee563c9a70c6d
Submitter: Zuul
Branch:    master

commit 0697559b51666197eecc0985611ee563c9a70c6d
Author: Trent Lloyd <trent.lloyd@canonical.com>
Date:   Tue Apr 9 14:30:34 2019 +0800

wsrep_slave_threads: default to 48 on bionic
    
    This improves performance significantly for environments constrained by
    calls to sync() such as HDDs or lower-end SSDs (or just very busy
    environments running many queries)
    
    By default the the queries from other nodes are only processed with
    1 thread, which means they will always run slower than on the master and
    any long running query will hold up all other queries behind it.
    
    Additionally, when multiple queries commit at once the server can
    combine them together into a single on-disk sync ('group commit') which
    is not possible otherwise. This optimisation appears to only occur on
    Bionic (Percona 5.7) and not Xenial (Percona 5.6).
    
    On Bionic, default to 48 threads which experimentally is a good number
    for OpenStack environments without being too crazy high. Galera ensures
    that queries that are dependent on each other are still executed
    sequentially and generally it is not expected to cause replication
    inconsistencies.
    
    However Percona Cluster 5.6 on Xenial appears to have a bug handling
    foreign key constraints that causes them to be violated (LP #1823850).
    The result is that the slave node crashes out and has to do a full SST
    to recover. The same issue is not present on the master. Thus we leave
    the default wsrep_slave_threads=1 on Xenial to avoid this issue for now
    particularly since Xenial does not appear to be able to use Group Commit
    to optimise the number of sync requests generated by the queries - so
    this option does not really improve performance there anyway.
    
    Partial-Bug: #1822903
    Change-Id: Ic9cdd6562f30a3e52aa3d26fea53ba7c2bbdc771

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-30: Fix proposed to charm-percona-cluster (stable/19.04)

#4

Fix proposed to branch: stable/19.04
Review: https://review.opendev.org/656531

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-27: Change abandoned on charm-percona-cluster (stable/19.04)

#5

Change abandoned by Dmitrii Shcherbakov (<email address hidden>) on branch: stable/19.04
Review: https://review.opendev.org/656531

Billy Olsen (billy-olsen) on 2021-05-17

Changed in charm-percona-cluster:
milestone:	none → 19.07
status:	In Progress → Fix Released

OpenStack Percona Cluster Charm

Poor performance on HDD environments (wsrep_slave_threads, tuning-level)

Bug Description

Other bug subscribers

Remote bug watches