ib_lru_dump causes periodic stalls

Bug #1050615 reported by Will Gunty
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
George Ormond Lorch III
5.1
Fix Released
Undecided
George Ormond Lorch III
5.5
New
Undecided
George Ormond Lorch III

Bug Description

With ib_lru_dump enabled, we saw periodic stalls on our database servers. This would result in 1-2 seconds of reduced throughput and increased threads_running.

Disabling and running manually does reproduce the stall. This stall would make sense, as the process likely locks the buffer pool to get the pages that are in it. The servers that we see this on are running with 244 GB Buffer pools, so it may be a function of the buffer pool size causing the stall as well.

Revision history for this message
Gavin Towey (gtowey) wrote :

For reference, here's one example of monitoring during the LRU dump:

[mysql-(none) 1]>select * from information_schema.XTRADB_ADMIN_COMMAND /*!XTRA_LRU_DUMP*/;
+------------------------------+
| result_message |
+------------------------------+
| XTRA_LRU_DUMP was succeeded. |
+------------------------------+

While outputting stats each second:

qps: 19496 | threads_connected: 1624 | threads_running: 6
qps: 15260 | threads_connected: 1358 | threads_running: 126
qps: 8266 | threads_connected: 1548 | threads_running: 688
qps: 5543 | threads_connected: 1573 | threads_running: 1113
qps: 10026 | threads_connected: 1172 | threads_running: 396
qps: 22155 | threads_connected: 1473 | threads_running: 8

Note, that these servers are configured with 244G of buffer pool, and are running about 20k QPS with 4k Connections per second, so things pile up very fast.

During these stalls, innodb status often shows transactions in the following state:

---TRANSACTION 14B4408B89, ACTIVE (PREPARED) 6 sec
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 7483185426, OS thread handle 0x72ec4940, query id 64781648921 10.12.17.93 box_live query end

These are commits that have been waiting 6 seconds to finish.

Revision history for this message
Gavin Towey (gtowey) wrote :

From the code in storage/innobase/buf/buf0lru.c in Percona 5.5.21-1

/********************************************************************//**
Dump the LRU page list to the specific file. */
#define LRU_DUMP_FILE "ib_lru_dump"

UNIV_INTERN
ibool
buf_LRU_file_dump(void)
/*===================*/
<snip>
        for (i = 0; i < srv_buf_pool_instances; i++) {
                buf_pool_t* buf_pool;

                buf_pool = buf_pool_from_array(i);

                mutex_enter(&buf_pool->LRU_list_mutex);
                bpage = UT_LIST_GET_LAST(buf_pool->LRU);

As you can see it's holding the LRU_list_mutex while it's doing the dump. This is almost certainly causing the stalls we're seeing, especially when you factor in locking behavior between various queries.

Our solution has simply been to set innodb_buffer_pool_restore_at_startup=0; disabling the feature altogether. It's actually not that critical for us to use anymore. I don't think that there's really a "fix" for this issue, but it's definitely a good lesson, and a case to beware of with other users.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

244 GB buffer pool is quite big indeed. There is already a bug to remediate this further to not do any I/O while holding the mutex - lp:686534

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

This issues is tracked as bug 686534, and the fix for 5.5 is in progress of being ported from 5.1. The essence of the fix is to release LRU_list_mutex on every iteration, before doing I/O.

I am closing this bug as a duplicate of 686534, please let us know if you have any further questions, thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.