Page cleaner should perform LRU flushing regardless of server activity
[3 Oct 6:29] Laurynas Biveinis
Main cleaner loop has
last_activity = srv_get_
/* Flush pages from end of LRU if required */
n_flushed = buf_flush_
I am still working to understand what "server activity" means. So far it looks like an ad-hoc collection of decisions "if row updated, then server active". These happen at high (mostly handler) level. But the cleaner works at low level of checkpoint ages and free list length. Thus, for example, purge for its work will need free pages, but they will not be provided by the cleaner thread if there is no other workload, as it will think that the server is idle. Purge will have to do single page flushes. The same probably applies to other background operations that need pages but don't bump server activity (ibuf merge?).
How to repeat:
Code analysis. We have seen (again credit to Alexey Stroganov) Sysbench performance instabilities that were attributed to what we think is either false dependency of cleaner on server activity (this bug), either server activity not bumped where it maybe should be (that will be reported after we complete our analysis).
Decouple cleaner LRU flushing from the server activity by performing it unconditionally in the cleaner loop. LRU flushing already has all the information for flushing decisions: free list length. If the server is truly idle, the free list will be full, LRU flush will not happen. If free pages are being used, cleaner LRU flush can and should provide them regardless of the server activity state.