innodb_blocking_buffer_pool_restore cause startup return fail

Bug #1072538 reported by Jacky Leung
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Tomislav Plavcic
5.1
Fix Released
High
Tomislav Plavcic
5.5
Fix Released
High
Tomislav Plavcic
5.6
Fix Released
High
Tomislav Plavcic

Bug Description

This happen when I set innodb_blocking_buffer_pool_restore as ON. Because I am using script to monitoring system start up (like reboot), if any process return non-zero codes then it will report alert to admin. But this is actually a false alert, as the system is able to continue to run fine without any errors.

I have attached the log below to show what is happening. First section of log is to show service mysql start return OK when i have innodb_blocking_buffer_pool_restore as OFF.

Second section is happening when innodb_blocking_buffer_pool_restore as ON

System:
* Ubuntu 12.04

DB:
* Server version: 5.5.28-29.1-log Percona Server (GPL), Release 29.1

root@db-product-master:/mnt/mysql/mysql-data# service mysql start
 * Starting MySQL (Percona Server) database server mysqld [ OK ]
 * Checking for corrupt, not cleanly closed and upgrade needing tables.

121029 16:26:28 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
121029 16:26:32 mysqld_safe Starting mysqld daemon with databases from /mnt/mysql/mysql-data
121029 16:26:32 [Warning] options --log-slow-admin-statements, --log-queries-not-using-indexes and --log-slow-slave-statements have no effect if --log_slow_queries is not set
121029 16:26:32 [Note] Plugin 'FEDERATED' is disabled.
121029 16:26:32 InnoDB: The InnoDB memory heap is disabled
121029 16:26:32 InnoDB: Mutexes and rw_locks use GCC atomic builtins
121029 16:26:32 InnoDB: Compressed tables use zlib 1.2.3
121029 16:26:32 InnoDB: Using Linux native AIO
121029 16:26:32 InnoDB: Initializing buffer pool, size = 58.0G
121029 16:26:38 InnoDB: Completed initialization of buffer pool
121029 16:26:38 InnoDB: highest supported file format is Barracuda.
121029 16:26:43 InnoDB: Restoring buffer pool pages from ib_lru_dump
121029 16:26:43 InnoDB: Waiting for the background threads to start
121029 16:26:44 Percona XtraDB (http://www.percona.com) 1.1.8-rel29.1 started; log sequence number 12512420996260
121029 16:26:45 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
121029 16:26:45 [Note] - '0.0.0.0' resolves to '0.0.0.0';
121029 16:26:45 [Note] Server socket created on IP: '0.0.0.0'.
121029 16:26:45 [Note] Event Scheduler: Loaded 0 events
121029 16:26:45 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.28-29.1-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona Server (GPL), Release 29.1
121029 16:27:08 [Note] /usr/sbin/mysqld: Normal shutdown

121029 16:27:08 [Note] Event Scheduler: Purging the queue. 0 events
121029 16:27:10 InnoDB: Starting shutdown...
 InnoDB: stopped loading lru pages because of server shutdown
121029 16:27:10 InnoDB: Completed reading buffer pool pages (requested: 41839, read: 41417)
121029 16:27:18 InnoDB: Shutdown completed; log sequence number 12512420996260
121029 16:27:18 [Note] /usr/sbin/mysqld: Shutdown complete

root@db-product-master:/mnt/mysql/mysql-data# service mysql start
 * Starting MySQL (Percona Server) database server mysqld [fail]

121029 16:27:19 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
121029 16:27:45 mysqld_safe Starting mysqld daemon with databases from /mnt/mysql/mysql-data
121029 16:27:45 [Warning] options --log-slow-admin-statements, --log-queries-not-using-indexes and --log-slow-slave-statements have no effect if --log_slow_queries is not set
121029 16:27:45 [Note] Plugin 'FEDERATED' is disabled.
121029 16:27:45 InnoDB: The InnoDB memory heap is disabled
121029 16:27:45 InnoDB: Mutexes and rw_locks use GCC atomic builtins
121029 16:27:45 InnoDB: Compressed tables use zlib 1.2.3
121029 16:27:45 InnoDB: Using Linux native AIO
121029 16:27:45 InnoDB: Initializing buffer pool, size = 58.0G
121029 16:27:51 InnoDB: Completed initialization of buffer pool
121029 16:27:51 InnoDB: highest supported file format is Barracuda.
121029 16:27:56 InnoDB: Restoring buffer pool pages from ib_lru_dump

Tags: pkg

Related branches

Revision history for this message
Jacky Leung (jacky-5) wrote :

I was looking into the init.d script for more details, and I think this part is the problem.

case "${1:-''}" in
  'start')
        sanity_checks;
        # Start daemon
        log_daemon_msg "Starting MySQL (Percona Server) database server" "mysqld"
        if mysqld_status check_alive nowarn; then
           log_progress_msg "already running"
           log_end_msg 0
        else
            "${PERCONA_PREFIX}"/bin/mysqld_safe > /dev/null 2>&1 &
            # 6s was reported in #352070 to be too few when using ndbcluster
            for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
                sleep 1
                if mysqld_status check_alive nowarn ; then break; fi
                log_progress_msg "."
            done
            if mysqld_status check_alive warn; then
                log_end_msg 0
                # Now start mysqlcheck or whatever the admin wants.
                output=$(/etc/mysql/debian-start)
                [ -n "$output" ] && log_action_msg "$output"
            else
                log_end_msg 1
                log_failure_msg "Please take a look at the syslog"
            fi
        fi
        ;;

Because mysqld_status check_alive is not able to connect the DB the whole time (it took 15mins to load the LRU), so it throw error at that last else statement.

Is it possible to fix this false alert problem?

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I see two ways to fix this:

a) Respond to ping (and nothing more) while the blocking is in progress, not sure if this is possible.

b) Block the ping (block the connection) till the server is up. This will work in above circumstance but may not when the process itself is being timed, more like -- timeout 30 mysqladmin ping -- which will be killed anyway if the ping blocks, but should work with existing init scripts.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

I think we just should modify the init script to:

1. Wait for the server to become available forever (rather than just for the hard-coded timeout of 14 seconds)
2. If startup takes longer than a predefine threshold, print a warning, something along the lines "Waited for more than N seconds, it might be because buffer pool restore is in progress and innodb_blocking_buffer_pool_restore is enabled".

I don't see a more elegant solution. It's a conflict of 2 correctly working features.

Revision history for this message
Stewart Smith (stewart) wrote :

I think we should take approach that if blocking buffer pool restore has been selected, we just wait forever, or at least a formula (maybe buffer pool size / 100MB seconds)

Revision history for this message
Alexey Kopytov (akopytov) wrote :

But detecting if blocking buffer pool restore is enabled would require a .cnf parser in the init script that would take sections and comments into account. Probably my_print_defaults might be used for that.

On the other hand, a hard-coded timeout value does not only apply to blocking buffer pool restore. Whatever value is chosen, there's always possibility that a server startup can take more than that (recovery and log file creation come to mind).

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

MySQL 5.6 buffer pool dump/restore appears to be async-only, thus this bug does not apply there.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

So we may need to implement a blocking buffer pool restore in 5.6. Created https://blueprints.launchpad.net/percona-server/+spec/blocking-buffer-pool-restore. Which, if implemented, will again make this bug applicable to PS 5.6.

tags: added: pkg
Alexey Bychko (abychko)
tags: removed: pkg
tags: added: pkg
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Bug 1328262 describes another case where the default server startup timeout is not enough.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-602

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.