Percona Server moved to https://jira.percona.com/projects/PS

InnoDB performance drop in 5.7 because of the lru_manager

Bug #1690399 reported by jocelyn fournier on 2017-05-12

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona Server moved to https://jira.percona.com/projects/PS	Expired	Undecided	Unassigned

Bug Description

Hi,

I'm investigating a performance regression in InnoDB between 5.6 & 5.7.
I noticed a lot of time is spent in os_thread_sleep called from buf_lru_manager_sleep_if_needed().
Is there any way to avoid this (my configuration is using innodb_buffer_pool_instances=24, so I assume it creates 24 lru_manager as well ?)

Poor's man profiler result :
     89 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,cond=0x1f74bc0),mutex=<optimized,out>,,at,handle_connection,pfs_spawn_thread,start_thread,clone,??
     24 nanosleep,os_thread_sleep,buf_lru_manager_sleep_if_needed,out>),start_thread,clone,??
     23 pthread_cond_wait@@GLIBC_2.3.2,wait,reset_sig_count=<optimized,srv_worker_thread,start_thread,clone,??
     23 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,pop_jobs_item,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
      1 test_quick_select,mysql_update,Sql_cmd_update::try_single_table_update,Sql_cmd_update::execute,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
      1 ??,sigwaitinfo,timer_notify_thread_func,start_thread,clone,??
[...]

Thanks!
Jocelyn

Tags:

Revision history for this message

jocelyn fournier (joce) wrote on 2017-05-13:

#1

Why not using an event wait which would trigger a thread wakeup once the bufpool free list reaches some defined threshold - and perhaps auto adjust this threshold - instead of currently auto adjusting the lru_sleep_time in buf_lru_manager_adapt_sleep_time()?

Revision history for this message

jocelyn fournier (joce) wrote on 2017-05-17:

#2

A few useful variables used in my case :

innodb_log_file_size=32G
innodb_empty_free_list_algorithm=backof
innodb_buffer_pool_size=270G
innodb_buffer_pool_instances=24
innodb_lru_scan_depth=1024

Percona version was 5.7.17, I'm currently testing 5.7.18 with the improved LRU manager.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2017-05-25:

#3

Thanks for your bug report. I am not sure that the sleep itself here is a problem - the key for LRU threads is to flush the right amount of pages at the right time - whether the "right time" is reached by sleep or by event wait should be secondary to the choice of heuristics. But perhaps an event wait would allow to implement better heuristics than sleep.

How does 5.7.18 testing look?

tags:	added: lru-flusher
tags:	added: performance

Revision history for this message

jocelyn fournier (joce) wrote on 2017-05-29:

#4

Hi Laurynas!

Unfortunately, 5.7.18 doesn't really change much in my case.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2017-05-29:

#5

I see. Back to original issue, the mapping between LRU managers and buffer pool instances is 1:1 by design, and the only way to reduce the LRU manager number is to reduce the buffer pool instance count.

Then, mistuned LRU flushing could manifest in different ways, the more serious one is the lack of free pages. That is seen in PMP as stacktraces involving buf_LRU_get_free_block, and your PMP does not show it. Another would be too-aggressive LRU flushing, but does not seem likely either here (it is capped by innodb_lru_scan_depth for each ~11GB bp instance, which does not seem excessively high).

Thus, I don't see immediate evidence that your performance drop is directly related to LRU flushing yet. Perhaps you can provide further details about the drop itself?

Changed in percona-server:
status:	New → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-07-29:

#6

[Expired for Percona Server because there has been no activity for 60 days.]

Changed in percona-server:
status:	Incomplete → Expired

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#7

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3701

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.