provide tuning configuration options (was Huawei RH2288H V3 mysql 3 HA units, ~100% ioutil on idle (ext4 journaling + mysqld))

Bug #1599222 reported by Alvaro Uria
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Fix Released
High
James Page
percona-cluster (Juju Charms Collection)
Invalid
High
Unassigned

Bug Description

Manufacturer: Huawei RH2288H V3
Ubuntu 14.04.4 LTS
Ubuntu 4.2.0-38.45~14.04.1-generic 4.2.8-ckt10
Juju 1.25.5, OS Charms 16.04; Liberty

percona-cluster deployed on 3 LXC units
lp:charms/trusty/percona-cluster;revno=99

hacluster charm used for VIP management.
lp:charms/trusty/hacluster;revno=56

OpenStack deployed on 33 compute-storage nodes.

"/var/lib/lxc" has been tested on:
1) sda1, together with rootfs (ext4)
2) sda2, separated from rootfs (ext4)
3) sdb1, on a different, non-RAID disk
3.1) sdb1 on ext4
3.2) sdb1 on xfs

On all cases, both iostat and iotop showed high %ioutil on an idle OpenStack environment. If 2 out of 3 mysql units get their mysql service stopped, only the remaining metal running MySQL shows high ioutil.

We have another environment running 13 compute-storage nodes, with same specifications, having a similar issue (high io on drives where mysql is being run).

I'll add a few mysql logs. Let me know if you would need further details.

Revision history for this message
Alvaro Uria (aluria) wrote :
Revision history for this message
Alvaro Uria (aluria) wrote :
Revision history for this message
Alvaro Uria (aluria) wrote :
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

We've been able to workaround this issue by
tuning mysql's some innodb parameters, notably:

# main improvement: from ~90% util to ~20%:
innodb_flush_log_at_trx_commit=0

#UPDATES also (openstack does quite some):
innodb_change_buffering=all

# ~90 iops -> 100%util as observed
innodb_io_capacity=100

locally branched charm "cowboy" patch:
http://paste.ubuntu.com/18724829/

IMO this charm should expose a "config-flags" setting
(e.g. as openstack charms do).

Revision history for this message
Alvaro Uria (aluria) wrote :

I've tested #4 on different hardware, but same specs. IO improvement was from ~90-100% to <20% (stable at ~6% when no activity).

Thank you Juanjo!

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

I, Alvaro.

Can you please repeat your test, just revert innodb_flush_log_at_trx_commit, set it to 1.
When it is set to 0 or 2 there is a chance of data loss in the case of power failure or controller error, as the data is not synced to the disks on every transaction commit.

I would strongly recommend against changing the innodb_flush_log_at_trx_commit.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

@mariosplivalo
> Can you please repeat your test, just revert innodb_flush_log_at_trx_commit, set it to 1.
> [...]

IIRC this setting was the main driver for the I/O bottleneck,
as kinda expected, worth trying of course.

> I would strongly recommend against changing the innodb_flush_log_at_trx_commit.

Understood, it's obviously a tradeoff, which in this 3×HA deployment case
doesn't apply for a single unit power failure / controller error, taking
advantage of other running replicas.

Revision history for this message
James Page (james-page) wrote :

I feel we need something like the mysql charm has:

    tuning-level:
        default: safest
        type: string
        description: Valid values are 'safest', 'fast', and 'unsafe'. If set to safest, all settings are tuned to have maximum safety at the cost of performance. Fast will turn off most controls, but may lose data on crashes. unsafe will turn off all protections.

exposing raw config-flags is OK, but this feels more like a set of expert options, and we want the charm to encapsulate that sort of knowledge in high level controls, rather than four config options or directly injected flags.

Revision history for this message
James Page (james-page) wrote :

And code from charm:

configs['innodb-flush-log-at-trx-commit']=1
configs['sync-binlog']=1

if 'InnoDB' in preferred_engines:
    configs['innodb-buffer-pool-size'] = chunk_size
    if configs['tuning-level'] == 'fast':
        configs['innodb-flush-log-at-trx-commit']=2

summary: - Huawei RH2288H V3 mysql 3 HA units, ~100% ioutil on idle (ext4
- journaling + mysqld)
+ provide tuning configuration options (was Huawei RH2288H V3 mysql 3 HA
+ units, ~100% ioutil on idle (ext4 journaling + mysqld))
Changed in percona-cluster (Juju Charms Collection):
status: New → Triaged
importance: Undecided → Wishlist
James Page (james-page)
Changed in percona-cluster (Juju Charms Collection):
importance: Wishlist → High
tags: added: performance
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.openstack.org/425078
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=ae533965d6fc594b438b0b9cf9681bd425fb19cf
Submitter: Jenkins
Branch: master

commit ae533965d6fc594b438b0b9cf9681bd425fb19cf
Author: James Page <email address hidden>
Date: Wed Jan 25 09:25:23 2017 +0000

    Disable hostname resolution

    MySQL automatically attempts to resolve hostnames to IP addresses,
    mapping to ACL entries for users; this adds overhead for each
    connection, and when DNS is wonky in some way can cause access
    issues.

    Use 'skip-name-resolve' to ensure that only IP addresses are
    used with checking ACL's.

    Change-Id: Idf55ddc3090da97a96dd0bbc30fc845f65d1692c
    Partial-Bug: 1599222

James Page (james-page)
Changed in charm-percona-cluster:
importance: Undecided → High
status: New → Triaged
Changed in percona-cluster (Juju Charms Collection):
status: Triaged → Invalid
James Page (james-page)
Changed in charm-percona-cluster:
milestone: none → 17.05
James Page (james-page)
Changed in charm-percona-cluster:
status: Triaged → In Progress
assignee: nobody → James Page (james-page)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (master)

Fix proposed to branch: master
Review: https://review.openstack.org/440333

Revision history for this message
James Page (james-page) wrote :

FWIW innodb_change_buffering default to all

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.openstack.org/440333
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=bda27479a45b69eff58a46a73813b1cc331bba8e
Submitter: Jenkins
Branch: master

commit bda27479a45b69eff58a46a73813b1cc331bba8e
Author: James Page <email address hidden>
Date: Thu Mar 2 10:34:31 2017 +0000

    Add tuning-level configuration option

    Inline with the mysql charm, add a tuning-level configuration option
    that allows end users to change the configuration profile for PXC.

    This option supports three values

       safest (default): use configuration options with best data
                         integrity guarantees.
       fast: compromise some data integrity guarantees
                         to improve performance.
       unsafe: pretty much throw away all data integrity
                         guarantees to maximise performance.

    In clustered deployments, 'fast' and 'unsafe' may be appropriate to
    use but should be considered carefully before reconfiguration away
    from the default 'safest' option.

    Right now, this option tweaks the innodb_flush_log_at_trx_commit
    value for PXC:

       safest (default): 1
       fast: 2
       unsafe: 0

    but should be used for other tuning optimizations that come along
    in the future.

    Also add direct configuration options for:

       innodb-change-buffering
       innodb-io-capacity

    to allow end users to tweak other performance optimizations that
    we can't yet do automatically using charm options.

    This commit also includes a resync of charm-helpers, which includes
    the fix to flush priviledges after adding grants to resolve bug
    1513239.

    Change-Id: I7c31e3bfbb825ae7091913e678dd1b7893892d1d
    Closes-Bug: 1599222
    Closes-Bug: 1513239

Changed in charm-percona-cluster:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-percona-cluster:
milestone: 17.05 → 17.08
James Page (james-page)
Changed in charm-percona-cluster:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.