[ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached.

Bug #1221608 reported by Roel Van de Paar on 2013-09-06
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server
High
Roel Van de Paar
5.1
Undecided
Unassigned
5.5
Undecided
Unassigned
5.6
High
Roel Van de Paar

Bug Description

2013-09-06 02:59:35 27330 [Note] /ssd/Percona-Server-5.6.12-rc60.4-414.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.6.13-rc60.4-log' socket: '/ssd//873186/current1_6/tmp/master.sock' port: 13100 Percona Server with XtraDB (GPL), Release rc60.4, Revision 414
2013-09-06 03:02:41 27330 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation.
 If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show processlist' or 'show engine innodb status' can give additional hints.
2013-09-06 03:02:41 27330 [Note] Threadpool has been blocked for 30 seconds

2013-09-06 03:39:48 27330 [Note] /ssd/Percona-Server-5.6.12-rc60.4-414.Linux.x86_64/bin/mysqld: Normal shutdown

Also see bug 1206565

Roel Van de Paar (roel11) wrote :
no longer affects: percona-xtradb-cluster
Changed in percona-server:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Sergei Glushchenko (sergei.glushchenko)
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :

15 threads run, only TP option used is --mysqld=--thread_handling=pool-of-threads

tags: added: tp

I encounted same issue while using PXB for SST where PXB couldn't
take backup due to thread pool saturation. The fix (or
workaround) was to introduce extra_port parameter to SST which
then used the extra_port in PS through PXB, the rationale being
that extra_port is used for maintenance tasks.

tags: added: pxc
Roel Van de Paar (roel11) wrote :

qablock: this is causing mysqld to lockup (see bug 1222694 which is assumed a duplicate of this one)

tags: added: qablock
Roel Van de Paar (roel11) wrote :

I think we located the problem for this, and it looks like it was RQG yy (sql) grammar related:

PAST: SET GLOBAL thread_pool_max_threads = zero_to_thousand
NOW: SET GLOBAL thread_pool_max_threads = hundred_to_thousand

zero_to_thousand:
        0 | 1 | 2 | 10 | 100 | 150 | 200 | 250 | 300 | 400 | 500 | 600 | 650 | 700 | 800 | 900 | 999 | 1000 ;

hundred_to_thousand:
        100 | 150 | 200 | 250 | 300 | 400 | 500 | 600 | 650 | 700 | 800 | 900 | 999 | 1000 ;

If the issue is seen again, it's different (i.e. 100 should be plenty, with a maximum of --threads=25 in RQG)

However, even if this be the cause, the problem is that mysqld just completely freezes/locks up and there is nothing one can do with it. The only and last message shown is "Threadpool has been blocked for 30 seconds" and then it just sits there (with the one possible exception of shutdown working correctly as per the log above, but this would need re-verification).

If this is the way that threadpool locks up the server, then maybe the error message should be repeated regularly at the very least, so it's more clear what is happening), or maybe another timeout of some sort would be an idea?

There is another oddity: as per chats with Laurynas, when a 'thread apply all bt' is executed in gdb against a server hanging like this, only a single thread is shown. Here is also an important question: is the server still actually doing something, or not? (I.e. are the "other" live threads still live?). Can development team have a look into adding an MTR testcase for this? Should be relatively easy with a low thread_pool_max_threads setting + a larger set of executing threads matched with DEBUG_SYNC if so required.

As the last question, if it's true, would be a critical one (i.e. server not processing anything in the locked up state), I will mark this bug as critical and 56qual until we can prove otherwise.

tags: added: 56qual
Roel Van de Paar (roel11) wrote :

As soon as I can confirm that mysqld's are not getting locked with the 100-1000 setting, I can at least remove qablock from this bug, but we'll need to leave critical/56qual as per the last comment above.

Roel Van de Paar (roel11) wrote :

RQG related patch was discussed here: bug 1222694

Roel Van de Paar (roel11) wrote :

Bug in comment #8 also has output of thread apply all bt example

Roel Van de Paar (roel11) wrote :

grep "Threadpool has been blocked" vardir1_*/log/master.err
 > No more of these in last run, so 100-1000 seems to avoid locks

Remaining item: question in #6

Roel Van de Paar (roel11) wrote :

Removing 56qual as this is a beta feature, and the workaround (or better "how to use this") is clear: increase # GLOBAL thread_pool_max_threads

tags: removed: 56qual qablock
tags: added: i35551

We should reproduce it with reasonable maximum number of threads and additional admin connection enabled to see why threads are locked. Until then it looks like miss configuration issue for me.

aradapilot (aradapilot) wrote :

If the feature says it will halt additional threads from being created, but in fact takes down the whole server, that doesn't seem like a misconfiguration issue. I mean, we can just increase the cap, until that one gets hit and it takes the system down again - but we also lose the value of that config option altogether, which is a useful variable with thread pools.

IMHO we still should confirm the issue in GDB with low number of threads configured to confirm that this is indeed misconfiguration. Even with a low number of allowed threads, they should complete their work and allow new connections. Moreover, we still have the issue of "thread apply all bt" showing a single thread only. Thus it is possible that we are seeing a genuine bug here.

Roel -

Does the issue of gdb showing only a single stacktrace still occur?

Launchpad Janitor (janitor) wrote :

[Expired for Percona Server 5.5 because there has been no activity for 60 days.]

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers