InnoDB performance drop in 5.7 because of the lru_manager

Bug #1690399 reported by jocelyn fournier on 2017-05-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Expired
Undecided
Unassigned

Bug Description

Hi,

I'm investigating a performance regression in InnoDB between 5.6 & 5.7.
I noticed a lot of time is spent in os_thread_sleep called from buf_lru_manager_sleep_if_needed().
Is there any way to avoid this (my configuration is using innodb_buffer_pool_instances=24, so I assume it creates 24 lru_manager as well ?)

Poor's man profiler result :
     89 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,cond=0x1f74bc0),mutex=<optimized,out>,,at,handle_connection,pfs_spawn_thread,start_thread,clone,??
     24 nanosleep,os_thread_sleep,buf_lru_manager_sleep_if_needed,out>),start_thread,clone,??
     23 pthread_cond_wait@@GLIBC_2.3.2,wait,reset_sig_count=<optimized,srv_worker_thread,start_thread,clone,??
     23 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,pop_jobs_item,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
      1 test_quick_select,mysql_update,Sql_cmd_update::try_single_table_update,Sql_cmd_update::execute,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread,clone,??
      1 ??,sigwaitinfo,timer_notify_thread_func,start_thread,clone,??
[...]

Thanks!
  Jocelyn

jocelyn fournier (joce) wrote :

Why not using an event wait which would trigger a thread wakeup once the bufpool free list reaches some defined threshold - and perhaps auto adjust this threshold - instead of currently auto adjusting the lru_sleep_time in buf_lru_manager_adapt_sleep_time()?

jocelyn fournier (joce) wrote :

A few useful variables used in my case :

innodb_log_file_size=32G
innodb_empty_free_list_algorithm=backof
innodb_buffer_pool_size=270G
innodb_buffer_pool_instances=24
innodb_lru_scan_depth=1024

Percona version was 5.7.17, I'm currently testing 5.7.18 with the improved LRU manager.

Thanks for your bug report. I am not sure that the sleep itself here is a problem - the key for LRU threads is to flush the right amount of pages at the right time - whether the "right time" is reached by sleep or by event wait should be secondary to the choice of heuristics. But perhaps an event wait would allow to implement better heuristics than sleep.

How does 5.7.18 testing look?

tags: added: lru-flusher
tags: added: performance
jocelyn fournier (joce) wrote :

Hi Laurynas!

Unfortunately, 5.7.18 doesn't really change much in my case.

I see. Back to original issue, the mapping between LRU managers and buffer pool instances is 1:1 by design, and the only way to reduce the LRU manager number is to reduce the buffer pool instance count.

Then, mistuned LRU flushing could manifest in different ways, the more serious one is the lack of free pages. That is seen in PMP as stacktraces involving buf_LRU_get_free_block, and your PMP does not show it. Another would be too-aggressive LRU flushing, but does not seem likely either here (it is capped by innodb_lru_scan_depth for each ~11GB bp instance, which does not seem excessively high).

Thus, I don't see immediate evidence that your performance drop is directly related to LRU flushing yet. Perhaps you can provide further details about the drop itself?

Changed in percona-server:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for Percona Server because there has been no activity for 60 days.]

Changed in percona-server:
status: Incomplete → Expired

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3701

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers