Percona Server moved to https://jira.percona.com/projects/PS

performance regression from fix for lp:1433432

Bug #1631309 reported by Axel Schwenke on 2016-10-07

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	Medium	Nickolay Ihalainen	Percona Server moved to https://jira.percona.com/projects/PS 5.6.36-82.0
5.7	Fix Released	Medium	Nickolay Ihalainen	Percona Server moved to https://jira.percona.com/projects/PS 5.7.18-14

Bug Description

The fix for lp:1433432 in Percona Server 5.6 introduces a severe performance regression. Sysbench OLTP read-only throughput drops by 50% (single thread) up to 95% (40 threads on a 16-core machine).

In order to see that, the active set of the database must be slightly bigger than the buffer pool and the workload must access rows in a rather random fashion.

Here are some numbers. Command line:
sysbench-0.4.12 --test=oltp --oltp-read-only --oltp-range-size=1000 --oltp-dist-type=uniform --oltp-table-size=1000000 --mysql-socket=/tmp/mysqld.sock.sysbench --mysql-user=root --num-threads=40 --max-time=30 run

5.6.27: 660 tps
5.6.28: 36 tps

This doesn't change when using multiple tables. Performance comes back to normal when the buffer pool is increased. I think this performance regression is too big and too likely to being hit by a real-world user to be ignored. It is quite common that the active set of a database does not fit into the buffer pool.

Suggestion: roll back commit 6532572a783ea5a025561eda9b1a9350c73405a6

Tags:

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2016-10-07:

What is the buffer pool size in your experiment?

tags:

added: regression xtradb

Revision history for this message

Axel Schwenke (ahel) wrote on 2016-10-07:

Everything is on defaults (the my.cnf is effectively empty). So the buffer pool is 128M.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2016-10-07:

Have your tests been performed with XtraDB in MariaDB or in Percona Server? (I wonder if there could be a difference in defaults). Could you please check innodb_buffer_pool_instances, innodb_lru_scan_depth values? (or just paste SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES)

Revision history for this message

Axel Schwenke (ahel) wrote on 2016-10-07:

This has indeed been found in MariaDB first. But I verified the regression in Percona Server (in order to make sure this came from XtraDB upstream). As a matter of fact the regression is heavier in Percona Server (95%) as in MariaDB (90%). This might indeed be caused by different defaults, however not innodb_lru_scan_depth. It's set to 1024 in both MariaDB and Percona Server.

The numbers I quoted above are for Percona Server, built from source tarballs and with nothing in my.cnf except socket and datadir location.

Rolling back the said commit restores performance in both cases.

Revision history for this message

Axel Schwenke (ahel) wrote on 2016-10-07:

PS: innodb_buffer_pool_instances. Defaults to 8 in MariaDB and PS. Different number of instances doesn't change the regression. Numbers are the same for 1, 2, 4 or 8 partitions. I tested that specifically as I expected this to be a mutex contention issue.

Laurynas Biveinis (laurynas-biveinis) on 2016-10-10

tags:

added: performance

Revision history for this message

Nickolay Ihalainen (ihanick) wrote on 2016-11-26:

Hi,
I'm able to reproduce this issue with both 5.6 and 5.7:
single SATA HDD, 8 core amd cpu.
Settings are default for mysql-test-run except for:
--innodb_log_file_size=50331648 --innodb-buffer-pool-size=134217728
Similar results could be obtained with 8MB buffer pool.
I've used 8 sysbench threads (results with 40 are similar).

Version - Transactions per second
5.5-unmodified: 212

5.6-unmodified: 4
5.6-rollback : 68
5.6-us-sleep : 140

5.7-unmodified: 46
5.7-rollback : 69
5.7-us-sleep : 239

As you can see just a rollback will not fix the issue completely.
Original page cleaner thread operates with millisecond sleep time and can increase/decrease delay between LRU cleanup with 50ms step. For modern disks it's unusual delay (even sata hdd could have 8ms write latency).
Thus it looks reasonable to operate with microsecond sleep time (especially for ssd) and 50-100 microsecond increments/decrements.

What kind of load should be affected by this patch?
Data set is bigger than buffer pool, there is no / small amount of free pages all the time.
Load is almost read-only or read-only, because in case of insert-only load free pages could be obtained by flushing dirty ones (same case as in 1433432).

Patches for us page cleaner sleep could be found here:
https://github.com/ihanick/percona-server/tree/5.6-ps-bug1631309
https://github.com/ihanick/percona-server/tree/5.7-ps-bug1631309

Test conditions and raw results:
https://gist.github.com/ihanick/601d8c67580724c10e900f2daf1639ec

Changed in percona-server:
status:	New → Confirmed

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2016-11-28:

I believe microsecond sleep makes sense in this scenario, but probably the tuning step needs to be in milisecond range for big buffer pools / threads being in the middle of tuning range - as microsecond tuning will be too slow to adapt with 1 second loop iterations

Revision history for this message

Nickolay Ihalainen (ihanick) wrote on 2016-11-28:

The performance is still bad with just 1ms sleep time:
Benchmark for 5.6: Fix bug 1631309 Use 1ms page cleaner sleep, rows: 1001624, BP: 128.00000000M
prepare time: 0m19.102s
transactions: (34.87 per sec.) (12.52 per sec.) (9.13 per sec.)

But this trick significantly improves performance for rollback variant (it's became close to us sleep variant)
https://github.com/ihanick/percona-server/tree/5.6-ps-bug1631309-rollback-1ms
Increase sleep time by 1ms, Use 2x smaller value during decrease.

Benchmark for 5.6: Fix bug 1631309 Use 1ms page cleaner sleep, rollback, rows: 1001624, BP: 128.00000000M
prepare time: 0m19.095s
transactions: (437.99 per sec.) (455.60 per sec.) (466.09 per sec.)

Just a rollback:
Benchmark for 5.6: rollback 1433432, rows: 1001624, BP: 128.00000000M
prepare time: 0m19.581s
transactions: (69.87 per sec.) (69.39 per sec.) (69.37 per sec.)

us sleep time patch without one cpu core limit (by task set), using 6 cores:
Benchmark for 5.6: Fix bug 1631309 Use us page cleaner sleep, rows: 500812, BP: 64.00000000M
prepare time: 0m9.430s
transactions: (384.81 per sec.) (416.51 per sec.) (402.94 per sec.)

Updated benchmark results file.
https://gist.github.com/ihanick/601d8c67580724c10e900f2daf1639ec#file-benchmark_results-txt

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2016-11-29:

I mean, we should have microsecond tuning, but my intuition is that the tuning step should be increased to millisecond range when setting supports it - probably when the sleep time in the middle of the 0-1 second range, and with a large buffer pool.

Revision history for this message

Nickolay Ihalainen (ihanick) wrote on 2016-11-30:

#10

https://gist.github.com/ihanick/601d8c67580724c10e900f2daf1639ec#file-benchmark-results2-txt

1. us sleep is not able to resolve the issue, it acts just like a polling and decreased increment
(5.6: 500us sleep, no limit in benchmark results)
2. polling works well for both read-only and insert-only load for for 1% case
3. 50ms increment is too big for 6-20% free list filling (5.6: rollback+1ms+nodiv limit in benchmark results)
4. 1ms increment is able to resolve both read-only and insert-only load problems.
5. lp:1433432 was fixed incorrectly:
5MB buffer pool has 62 free pages and 256 database pages.
Polling enabled at free_len <= max_free_len / 100, by default srv_LRU_scan_depth = 1024 ===> free_len < 10
Polling disabled at
free_len > max_free_len / 5 => free_len > 204

Thus the fix should be:
ulint lru_len = UT_LIST_GET_LEN(buf_pool_from_array(0)->LRU);
ulint max_free_len = ut_min(lru_len / 2, srv_LRU_scan_depth) * srv_buf_pool_instances;
(or it could be full buffer pool size in bytes).

Conclusions:
https://github.com/ihanick/percona-server/tree/5.6-ps-bug1631309
it's enough to fix both issues:
a) spinning for small buffer pools
b) migration from 50ms to spinning is too long for heavy read-only load with data set > buffer pool for any buffer pool size (e.g. 3-5% for mysql 5.7 and 32GB buffer pool for 20 benchmark threads) ( but for big buffer pool we can suggest to use bigger lru scan depth instead).

Revision history for this message

Nickolay Ihalainen (ihanick) wrote on 2016-11-30:

#11

Changed to just:
ulint lru_len = UT_LIST_GET_LEN(buf_pool_from_array(0)->LRU);
ulint max_free_len = ut_min(lru_len, srv_LRU_scan_depth) *

Revision history for this message

Nickolay Ihalainen (ihanick) wrote on 2017-03-29:

#12

5.6: https://github.com/percona/percona-server/pull/1219
5.7: https://github.com/percona/percona-server/pull/1220

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#13

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1753

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.