Add hard timeouts to page cleaner flushes
|Percona Server||Status tracked in 5.7|
[27 Sep 15:51] Laurynas Biveinis
Currently the page cleaner thread is designed as follows, if I understand correctly:
- in a loop:
- do LRU tail flushing, controlled by max LRU scan depth;
- do flush list flushing, controlled by I/O capacity;
- sleep the remaining time until 1s.
Now the proper tuning (correct me if I'm wrong) would strive to minimize the sleep time and utilize storage as closely as possible to capacity.
But the flushing time is not only determined by the I/O variable settings, but also by how quickly can cleaner can get access to the shared resources such as buffer pool mutexes too. This means that the page cleaner iteration time, even with properly tuned I/O settings, can exceed 1 second. Under very heavy loads (sysbench, I/O-bound, 512 threads on 32 core) one page cleaner iteration might take as long as minutes. This is bad due to several reasons: if LRU flushing is taking that long, the flush list flushing is not happening, the query threads go to sync preflush. if flush list flushing is taking that long, the LRU tail flushing is not happening, the query threads go to single page LRU flushes. Moreover the page cleaner and adaptive flushing heuristics are designed to be updated roughly every second, and probably will not get a very exact idea of what's going if called once in several minutes.
How to repeat:
Benchmarks, code analysis
A partial solution would be to add a hard timeout, i.e. limit LRU tail flush to 1 second, checked after each batch, and flush list flushing to 1 second too. (the exact value of the constant is debatable). It looks it's more important to re-check heuristics and to alternate periodically between the two than to allow one of them to complete fully.
This is not a complete solution for the case of multiple buffer pool instances, because with a timeout, some instances may receive no flushing at all. This will be reported separately.