InnoDB performance drop in 5.7 because of the lru_manager

Bug #1690399 reported by jocelyn fournier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Expired
Undecided
Unassigned

Bug Description

Hi,

I'm investigating a performance regression in InnoDB between 5.6 & 5.7.
I noticed a lot of time is spent in os_thread_sleep called from buf_lru_manager_sleep_if_needed().
Is there any way to avoid this (my configuration is using innodb_buffer_pool_instances=24, so I assume it creates 24 lru_manager as well ?)

Poor's man profiler result :
     89 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,cond=0x1f74bc0),mutex=<optimized,out>,,at,handle_connection,pfs_spawn_thread,start_thread,clone,??
     24 nanosleep,os_thread_sleep,buf_lru_manager_sleep_if_needed,out>),start_thread,clone,??
     23 pthread_cond_wait@@GLIBC_2.3.2,wait,reset_sig_count=<optimized,srv_worker_thread,start_thread,clone,??
     23 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,pop_jobs_item,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
      1 test_quick_select,mysql_update,Sql_cmd_update::try_single_table_update,Sql_cmd_update::execute,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
      1 ??,sigwaitinfo,timer_notify_thread_func,start_thread,clone,??
[...]

Thanks!
  Jocelyn

Revision history for this message
jocelyn fournier (joce) wrote :

Why not using an event wait which would trigger a thread wakeup once the bufpool free list reaches some defined threshold - and perhaps auto adjust this threshold - instead of currently auto adjusting the lru_sleep_time in buf_lru_manager_adapt_sleep_time()?

Revision history for this message
jocelyn fournier (joce) wrote :

A few useful variables used in my case :

innodb_log_file_size=32G
innodb_empty_free_list_algorithm=backof
innodb_buffer_pool_size=270G
innodb_buffer_pool_instances=24
innodb_lru_scan_depth=1024

Percona version was 5.7.17, I'm currently testing 5.7.18 with the improved LRU manager.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Thanks for your bug report. I am not sure that the sleep itself here is a problem - the key for LRU threads is to flush the right amount of pages at the right time - whether the "right time" is reached by sleep or by event wait should be secondary to the choice of heuristics. But perhaps an event wait would allow to implement better heuristics than sleep.

How does 5.7.18 testing look?

tags: added: lru-flusher
tags: added: performance
Revision history for this message
jocelyn fournier (joce) wrote :

Hi Laurynas!

Unfortunately, 5.7.18 doesn't really change much in my case.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

I see. Back to original issue, the mapping between LRU managers and buffer pool instances is 1:1 by design, and the only way to reduce the LRU manager number is to reduce the buffer pool instance count.

Then, mistuned LRU flushing could manifest in different ways, the more serious one is the lack of free pages. That is seen in PMP as stacktraces involving buf_LRU_get_free_block, and your PMP does not show it. Another would be too-aggressive LRU flushing, but does not seem likely either here (it is capped by innodb_lru_scan_depth for each ~11GB bp instance, which does not seem excessively high).

Thus, I don't see immediate evidence that your performance drop is directly related to LRU flushing yet. Perhaps you can provide further details about the drop itself?

Changed in percona-server:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Server because there has been no activity for 60 days.]

Changed in percona-server:
status: Incomplete → Expired
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3701

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.