The server can't start When starting with innodb_force_recovery and innodb_purge_thread being set

Bug #923820 reported by yinfeng
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Won't Fix
Undecided
Unassigned
5.5
New
Low
Unassigned
5.6
Invalid
Undecided
Unassigned
5.7
Invalid
Undecided
Unassigned

Bug Description

Percona5.5.18

repeat:
innodb_force_recovery >= 2
innodb_purge_thread = 1

we will see in error log a lot of information like :

InnoDB: Waiting for the background threads to start

a simple fix is to force innodb_purge_thread=0 When starting with innodb_force_recovery >=2 and innodb_purge_thread =1

Revision history for this message
yinfeng (yinfeng-zwx) wrote :
yinfeng (yinfeng-zwx)
Changed in percona-server:
assignee: nobody → yinfeng (yinfeng-zwx)
assignee: yinfeng (yinfeng-zwx) → nobody
Stewart Smith (stewart)
Changed in percona-server:
importance: Undecided → Medium
Revision history for this message
yinfeng (yinfeng-zwx) wrote :

another fix:
only when srv_force_recovery < SRV_FORCE_NO_BACKGROUND, innobase_start_or_create_for_mysql will check if purge thread has started.

is there anyone can help me review this small change?

Version:Percona Server 5.5.18

Index: storage/innobase/srv/srv0start.c
===================================================================
--- storage/innobase/srv/srv0start.c (revision 999)
+++ storage/innobase/srv/srv0start.c (working copy)
@@ -2028,7 +2028,8 @@
                if (srv_thread_has_reserved_slot(SRV_MASTER) == ULINT_UNDEFINED
                    || (srv_n_purge_threads == 1
                        && srv_thread_has_reserved_slot(SRV_WORKER)
- == ULINT_UNDEFINED)) {
+ == ULINT_UNDEFINED)
+ && srv_force_recovery<SRV_FORCE_NO_BACKGROUND) {

                        ut_print_timestamp(stderr);
                        fprintf(stderr, " InnoDB:

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I confirm this (from previous occurences), and the workaround indeed is setting number of purge thread to 0.

It has also been mentioned here http://bugs.mysql.com/bug.php?id=61104

=========================
[22 Jan 5:44] Shane Bester

On a side note:

When starting with innodb_force_recovery, you should set innodb_purge_threads=0 to avoid a looping during startup.

Bug 13616287 - STARTUP HANGS WITH "WAITING FOR THE BACKGROUND THREADS TO START"
========

Regarding the fix:

There is already a condition to check for it.

  /* Check for shutdown and whether we should do purge at all. */
  if (srv_force_recovery >= SRV_FORCE_NO_BACKGROUND
      || srv_shutdown_state != 0
      || srv_fast_shutdown) {

   break;
  }

in srv_purge_thread.

However, there is a condition before it:

  if (trx_sys->rseg_history_len < srv_purge_batch_size
      || (n_total_purged == 0
   && retries >= TRX_SYS_N_RSEGS)) {

   mutex_enter(&kernel_mutex);

   srv_suspend_thread(slot);

   mutex_exit(&kernel_mutex);

   os_event_wait(slot->event);

   retries = 0;
  }
Since during the startup, it is possible that the condition is true (rseg_history_len not to be high and exceed purge batch size), so the thread gets suspended.

The fix can simply be:
==========================

bzr diff storage/innobase/srv/srv0srv.c
=== modified file 'Percona-Server/storage/innobase/srv/srv0srv.c'
--- Percona-Server/storage/innobase/srv/srv0srv.c 2012-05-10 07:49:14 +0000
+++ Percona-Server/storage/innobase/srv/srv0srv.c 2012-09-05 17:30:06 +0000
@@ -3926,6 +3926,14 @@
                ulint n_pages_purged = 0;
                ulint cur_time;

+ /* Check for shutdown and whether we should do purge at all. */
+ if (srv_force_recovery >= SRV_FORCE_NO_BACKGROUND
+ || srv_shutdown_state != 0
+ || srv_fast_shutdown) {
+
+ break;
+ }
+
                /* If there are very few records to purge or the last
                purge didn't purge any records then wait for activity.
                We peek at the history len without holding any mutex
@@ -3946,14 +3954,6 @@
                        retries = 0;
                }

- /* Check for shutdown and whether we should do purge at all. */
- if (srv_force_recovery >= SRV_FORCE_NO_BACKGROUND
- || srv_shutdown_state != 0
- || srv_fast_shutdown) {
-
- break;
- }
-
                if (n_total_purged == 0 && retries <= TRX_SYS_N_RSEGS) {
                        ++retries;
                } else if (n_total_purged > 0) {

Changed in percona-server:
status: New → Confirmed
tags: added: contribution
Revision history for this message
Hui Liu (hickey) wrote :

Hi raghavendra, your suggested patch is not correct. I tried with your patch, still the same.

This problem is very obvious, srv_purge_thread is exited, due to srv_force_recovery >= SRV_FORCE_NO_BACKGROUND, breaking the while. In the same time, innobase_start_or_create_for_mysql is waiting all the SRV_WORKER threads (purge thread here) started, but failed always, as purge thread reserved slot ->in_use is FALSE (set when exit in srv_purge_thread).

The solution should be easy, NOT wait the exited purge thread in innobase_start_or_create_for_mysql when we want to exit srv_purge_thread during start-up.

Revision history for this message
Hui Liu (hickey) wrote :

btw, I discussed long time ago ( June this year) with yinfeng on this issue and solved the risk, with the later patch provided. It runs well on product environment, no complains from DBA on that any more.

Please assign this ticket to us if you think it's necessary for Percona Server.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Thanks, yes, you are right and looks like with that it will again end up in the same wait loop with patch in #3.

I also quickly looked at the 5.6 code (it also has multiple purge threads):

.........
 if (srv_force_recovery < SRV_FORCE_NO_BACKGROUND) {

  os_thread_create(
   srv_purge_coordinator_thread,
   NULL, thread_ids + 5 + SRV_MAX_N_IO_THREADS);

  ut_a(UT_ARR_SIZE(thread_ids)
       > 5 + srv_n_purge_threads + SRV_MAX_N_IO_THREADS);

  /* We've already created the purge coordinator thread above. */
  for (i = 1; i < srv_n_purge_threads; ++i) {
   os_thread_create(
    srv_worker_thread, NULL,
    thread_ids + 5 + i + SRV_MAX_N_IO_THREADS);
  }

  srv_start_wait_for_purge_to_start();
.......

There, they have moved thread creation under the SRV_FORCE_NO_BACKGROUND condition so that the thread is waited upon only if it has been created in the first place. (with your patch there will still be a purge thread created and may be running (in a suspended state) or exited, latter is fine but former is not necessary).

I didn't see any separate waiting on master thread in 5.6 code but in 5.5 it may be required and that won't cause any problems (since master thread suspends itself on SRV_FORCE_NO_BACKGROUND unlike the purge thread which exits)

Revision history for this message
Hui Liu (hickey) wrote :

5.6 has a better solution, seen from your code analyze, instead of waiting created purge thread, 5.6 avoided to create it, wait to check only under srv_force_recovery < SRV_FORCE_NO_BACKGROUND.

For patch #2, my thought is that, even if the purge thread is suspended, the waiter(innobase_start_or_create_for_mysql) would not blocked waiting purge thread ready, as srv_thread_has_reserved_slot would return slot number but not ULINT_UNDEFINED. So, it's not a risk :)

tags: added: upstream
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2726

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.