performance regression from fix for lp:1433432

Bug #1631309 reported by Axel Schwenke on 2016-10-07
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to
Status tracked in 5.7
Fix Released
Nickolay Ihalainen
Fix Released
Nickolay Ihalainen

Bug Description

The fix for lp:1433432 in Percona Server 5.6 introduces a severe performance regression. Sysbench OLTP read-only throughput drops by 50% (single thread) up to 95% (40 threads on a 16-core machine).

In order to see that, the active set of the database must be slightly bigger than the buffer pool and the workload must access rows in a rather random fashion.

Here are some numbers. Command line:
sysbench-0.4.12 --test=oltp --oltp-read-only --oltp-range-size=1000 --oltp-dist-type=uniform --oltp-table-size=1000000 --mysql-socket=/tmp/mysqld.sock.sysbench --mysql-user=root --num-threads=40 --max-time=30 run

5.6.27: 660 tps
5.6.28: 36 tps

This doesn't change when using multiple tables. Performance comes back to normal when the buffer pool is increased. I think this performance regression is too big and too likely to being hit by a real-world user to be ignored. It is quite common that the active set of a database does not fit into the buffer pool.

Suggestion: roll back commit 6532572a783ea5a025561eda9b1a9350c73405a6

What is the buffer pool size in your experiment?

tags: added: regression xtradb
Axel Schwenke (ahel) wrote :

Everything is on defaults (the my.cnf is effectively empty). So the buffer pool is 128M.

Have your tests been performed with XtraDB in MariaDB or in Percona Server? (I wonder if there could be a difference in defaults). Could you please check innodb_buffer_pool_instances, innodb_lru_scan_depth values? (or just paste SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES)

Axel Schwenke (ahel) wrote :

This has indeed been found in MariaDB first. But I verified the regression in Percona Server (in order to make sure this came from XtraDB upstream). As a matter of fact the regression is heavier in Percona Server (95%) as in MariaDB (90%). This might indeed be caused by different defaults, however not innodb_lru_scan_depth. It's set to 1024 in both MariaDB and Percona Server.

The numbers I quoted above are for Percona Server, built from source tarballs and with nothing in my.cnf except socket and datadir location.

Rolling back the said commit restores performance in both cases.

Axel Schwenke (ahel) wrote :

PS: innodb_buffer_pool_instances. Defaults to 8 in MariaDB and PS. Different number of instances doesn't change the regression. Numbers are the same for 1, 2, 4 or 8 partitions. I tested that specifically as I expected this to be a mutex contention issue.

tags: added: performance
Nickolay Ihalainen (ihanick) wrote :

I'm able to reproduce this issue with both 5.6 and 5.7:
single SATA HDD, 8 core amd cpu.
Settings are default for mysql-test-run except for:
--innodb_log_file_size=50331648 --innodb-buffer-pool-size=134217728
Similar results could be obtained with 8MB buffer pool.
I've used 8 sysbench threads (results with 40 are similar).

Version - Transactions per second
5.5-unmodified: 212

5.6-unmodified: 4
5.6-rollback : 68
5.6-us-sleep : 140

5.7-unmodified: 46
5.7-rollback : 69
5.7-us-sleep : 239

As you can see just a rollback will not fix the issue completely.
Original page cleaner thread operates with millisecond sleep time and can increase/decrease delay between LRU cleanup with 50ms step. For modern disks it's unusual delay (even sata hdd could have 8ms write latency).
Thus it looks reasonable to operate with microsecond sleep time (especially for ssd) and 50-100 microsecond increments/decrements.

What kind of load should be affected by this patch?
Data set is bigger than buffer pool, there is no / small amount of free pages all the time.
Load is almost read-only or read-only, because in case of insert-only load free pages could be obtained by flushing dirty ones (same case as in 1433432).

Patches for us page cleaner sleep could be found here:

Test conditions and raw results:

Changed in percona-server:
status: New → Confirmed

I believe microsecond sleep makes sense in this scenario, but probably the tuning step needs to be in milisecond range for big buffer pools / threads being in the middle of tuning range - as microsecond tuning will be too slow to adapt with 1 second loop iterations

Nickolay Ihalainen (ihanick) wrote :

The performance is still bad with just 1ms sleep time:
Benchmark for 5.6: Fix bug 1631309 Use 1ms page cleaner sleep, rows: 1001624, BP: 128.00000000M
prepare time: 0m19.102s
    transactions: (34.87 per sec.) (12.52 per sec.) (9.13 per sec.)

But this trick significantly improves performance for rollback variant (it's became close to us sleep variant)
Increase sleep time by 1ms, Use 2x smaller value during decrease.

Benchmark for 5.6: Fix bug 1631309 Use 1ms page cleaner sleep, rollback, rows: 1001624, BP: 128.00000000M
prepare time: 0m19.095s
    transactions: (437.99 per sec.) (455.60 per sec.) (466.09 per sec.)

Just a rollback:
Benchmark for 5.6: rollback 1433432, rows: 1001624, BP: 128.00000000M
prepare time: 0m19.581s
    transactions: (69.87 per sec.) (69.39 per sec.) (69.37 per sec.)

us sleep time patch without one cpu core limit (by task set), using 6 cores:
Benchmark for 5.6: Fix bug 1631309 Use us page cleaner sleep, rows: 500812, BP: 64.00000000M
prepare time: 0m9.430s
    transactions: (384.81 per sec.) (416.51 per sec.) (402.94 per sec.)

Updated benchmark results file.

I mean, we should have microsecond tuning, but my intuition is that the tuning step should be increased to millisecond range when setting supports it - probably when the sleep time in the middle of the 0-1 second range, and with a large buffer pool.

Nickolay Ihalainen (ihanick) wrote :

1. us sleep is not able to resolve the issue, it acts just like a polling and decreased increment
(5.6: 500us sleep, no limit in benchmark results)
2. polling works well for both read-only and insert-only load for for 1% case
3. 50ms increment is too big for 6-20% free list filling (5.6: rollback+1ms+nodiv limit in benchmark results)
4. 1ms increment is able to resolve both read-only and insert-only load problems.
5. lp:1433432 was fixed incorrectly:
5MB buffer pool has 62 free pages and 256 database pages.
Polling enabled at free_len <= max_free_len / 100, by default srv_LRU_scan_depth = 1024 ===> free_len < 10
Polling disabled at
free_len > max_free_len / 5 => free_len > 204

Thus the fix should be:
        ulint lru_len = UT_LIST_GET_LEN(buf_pool_from_array(0)->LRU);
        ulint max_free_len = ut_min(lru_len / 2, srv_LRU_scan_depth) * srv_buf_pool_instances;
(or it could be full buffer pool size in bytes).

it's enough to fix both issues:
a) spinning for small buffer pools
b) migration from 50ms to spinning is too long for heavy read-only load with data set > buffer pool for any buffer pool size (e.g. 3-5% for mysql 5.7 and 32GB buffer pool for 20 benchmark threads) ( but for big buffer pool we can suggest to use bigger lru scan depth instead).

Nickolay Ihalainen (ihanick) wrote :

Changed to just:
ulint lru_len = UT_LIST_GET_LEN(buf_pool_from_array(0)->LRU);
        ulint max_free_len = ut_min(lru_len, srv_LRU_scan_depth) *

Percona now uses JIRA for bug reports so this bug report is migrated to:

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers