Thread pool: new threads are created too slowly even with tuning

Bug #909774 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Undecided
Vladislav Vaintroub

Bug Description

If there are at least one active threads in a group, it takes long time to create a new one.

In the provided test case (1), two threads are running select from a big table, while the third one is trying to send a very fast query not depending on the table at all (e.g. SELECT NULL FROM DUAL). The timer in the test case starts right before the query and ends right after it, so the resulting time provided by MTR relates to this query only.

In the traditional case, without thread pool, the query takes a few milliseconds (1-3 ms in my tests).
With thread-pool-size=1 it takes 1.685 seconds.

I'm using thread-pool-size=1 for more obvious measurements, in fact the issue is scalable, it only requires more threads to reproduce it with a bigger pool size. For example, with thread-pool-size=4 and 10 threads querying the big table, it again takes about 1.7 seconds. Even with the minimum value of thread-pool-stall-limit=60 it still takes 200+ ms, which is 100-200 times more than with one-thread-per-connection.

If you run the same test with count=10 (the number of threads querying the table) and with thread-pool-size=1 and default thread-pool-stall-limit, in the process list output you can see that threads are created roughly one per second; and it will take ~10 seconds to process the select that we are timing.

There might be various ways how such a delay might impact real-life scenarios. To name one, assume you have a job (or just user inflow) that connects to the database every 0.5 second. Each connection runs a 5-second query and exits.

With the traditional thread handling (one thread per connection), even with a 1-core machine it's a tiny load which won't make any problem. After initial growth to 10-12 open connections or so, it will stabilize and be maintained this way as long as needed.
With the artificial delay on request processing, the connections will start piling up (since the requests arrive faster than they are processed). With default parameters on the same 1-core machine it will be only a matter of a minute or so to hit the max connections limit.

The second test case is an example of a script which can be used to observe the problem. It's not pretty, and it also might stop working the same way and will need adjustments after the bug#909537 is fixed, but it reflects the idea, as it's difficult to implement nicely using standard tools.

# Test case 1
# At the moment, it could be easier implemented with sleeps,
# but I want to separate it from the bug#909537.

--enable_connect_log

SET @tp_size=@@thread_pool_size;
SET @tp_idle=@@thread_pool_idle_timeout;
SET GLOBAL thread_pool_size=1;
SET GLOBAL thread_pool_idle_timeout=1;

--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings
CREATE TABLE t1 ( i INT );
INSERT INTO t1 VALUES (1),(2),(3),(4);
INSERT INTO t1 SELECT a.i FROM
 t1 a, t1 b, t1 c, t1 d, t1 e,
 t1 f, t1 g, t1 h, t1 i, t1 j;

--let $count=2

--let $id=$count
while ($id)
{
  --connect(con$id,localhost,root,,)
  --dec $id
}

# Wait till old idle threads are gone
--sleep 2

--let $id=$count
while ($id)
{
   --connection con$id
   --send
     eval SELECT COUNT(*) FROM t1 WHERE i=$id;
   --dec $id
}

--connection default
--start_timer
SELECT NULL FROM DUAL;
--end_timer
SHOW GLOBAL STATUS LIKE 'threadpool%threads';
SHOW PROCESSLIST;

DROP TABLE t1;

SET GLOBAL thread_pool_size=@tp_size;
SET GLOBAL thread_pool_idle_timeout=@tp_idle;

--exit

# End of test case 1

# Test case 2
# It assumes you already have the server running
# Tune the sleep time depending on your threadpool parameters

debug/client/mysql -uroot -e "SET GLOBAL lock_wait_timeout=5; DROP TABLE IF EXISTS test.t; CREATE TABLE test.t ( i INT ); LOCK TABLE test.t WRITE; SELECT SLEEP(300)" &

while [ 1 ]
do
  debug/client/mysql -uroot -ss --force -e "SELECT COUNT(*) FROM information_schema.processlist; SELECT 1 FROM test.t" &
  sleep 0.5
done

# End of test case 2

Related branches

Elena Stepanova (elenst)
description: updated
Elena Stepanova (elenst)
summary: - Thread pool: new threads are created too slowly even with maximum tuning
+ Thread pool: new threads are created too slowly even with tuning
Revision history for this message
Vladislav Vaintroub (wlad-montyprogram) wrote :

Please take a look onthe description how thread scheduling is done in SQL Server. The best description is perhaps in Ken Hendersons "internals" book, but there is some information spread around on the internet too (google for SQLServer, ums, scheduling).
For example, this one
http://www.vineetgupta.com/2005/11/sql-server-non-preemptive-scheduling-ums-clr-and-fiber-mode/
also describes how SQLServer schedules its tasks, whereby not in great level of details.

It is important to understand, zhat unlike OS scheduler which is preemptive, every pooling scheduler is non-preemptive (more or less, we allow some level of preemption). The objective of preemptive scheduler is giving every task a chance to execute. The objective of non-preemptive scheduler is maximizing performance. Quoting from the article above "job in life is to ensure that there is one unblocked thread executing on that CPU at any given point of time, and all other threads are blocked".

If we followed this rule we would never create an additional thread if there is already an active thread in the group. We do that however, even if it is at the speed which you do not like. We can reduce the stall limit, and we can probably talk about an additional QoS feature "max_queuing_time" parameter (which is I think what you ultimately would like to have - limiting queueing time for a request). But before that I would like to ensure the general objectives and techniques of thread pooling are understood and agreed upon.

Having said this, the example looks like specifically crafted thread pooling anti-pattern. It assumes never ending avalanche of long non-yielding CPU-intensive queries, and it assumes a rather questionably designed scheduled job (there is a better job scheduler in MySQL for such things btw:) . For such things I guess thread-per-connection would work better (and, even if you have threadpooling in MariaDB, you can have a separate thread-per-connection scheduler too, so you may decide according to your needs).

Revision history for this message
Vladislav Vaintroub (wlad-montyprogram) wrote :

Ok, I reduced thread_pool_stall_limit minimum to 10 milliseconds. Also introduced thread_pool_oversubscribe parameter to fine tune how many tasks a group can run at the same time (before the change max. parallel tasks was hardcoded to 4).

So now one can reduce stall limit and increase thread_pool_oversubscribe to get somewhat more "thread-per-connection"ish behavior in threadpool and more eager creation . But still, queuing of tasks will happen, since it is what thread pools do.

Changed in maria:
status: New → Fix Committed
Revision history for this message
Elena Stepanova (elenst) wrote :

>> the example looks like specifically crafted thread pooling anti-pattern

The initial description consisted of two parts. The first part was to show that even with the minimal deviation from the perfect flow the impact of the new (presumably default) configuration can be noticed by a live user. The second case was driven from it to some extreme to show that it can lead not only to performance problems, but to the loss of functionality. I agree that the second part might be not the best example, it was just a fast one. I think such patterns might exist in real life because they are easy to create, but will probably be rare; so lets ignore it.

I find the initial scenario in itself worrisome.

The article above bases the "job in life is to ensure that there is one unblocked thread executing..." statement on a somewhat arguable assumption: "since all threads are SQL spawned, they are "well-behaved" and include code that prevents them from monopolizing the system". I don't know if it's true for the SQL server in question, but in our case long non-yielding queries do happen. It's quite normal to expect that these queries might suffer some performance loss; but the first example shows that in fact _other_ queries are affected. Even with only two long queries running at the same time, the delay for unrelated simple short queries is percievable; and with 5-10 long queries, the delay for others might be seriously annoying.

If we let it be the default behavior, what we are likely to observe is that after an upgrade users will start complaining that "every now and then a simple query might hang for 5-10 seconds". Thinking about widespread real-life setups (web applications, virtual hosting, etc.), in many cases the schema owner might have no way whatsoever to avoid or even investigate that, since "bad" long queries might be happening in a different schema on the same server. Since long queries don't necessarily cause general system overload, monitors will show nothing suspicious, so the hoster admins will have a problem investigating it too, and the conclusion is likely to be "the server is just slow at times". It is a bad reputation that spreads fast and is hard to counterweigh with nice benchmark results.

I will try the fix to see how it works now, but in general my opinion is that it makes sense to disable the new behavior by default. People who really care about performance on the level of switching contexts don't run their servers with default parameters anyway -- they do fine-tuning. If they set thread pooling in their configuration manually, they will at least know what they changed if something goes wrong; while the users who only care whether their queries run 1 second or 5 won't get a new problem.

Revision history for this message
Vladislav Vaintroub (wlad-montyprogram) wrote :

>>It's quite normal to expect that these queries might suffer some performance loss; but the first example shows that in fact >>_other_ queries are affected.Even with only two long queries running at the same time, the delay for unrelated simple short >>queries is percievable; and with 5-10 long queries, the delay for others might be seriously annoying.

Well, I said "specially crafted anti-pattern". I meant by it, that to create something like your example, one has to reduce the thread pool size to 1, one has to run queries that never yield, and that run long, one has to start all long queries at the same time, and one has to measure the response time of only the very first "normally fast " query.

Reducing thread-pool-size to 1 (a computer with single processor is now quite hard to find in the wild) increases the probability of several long non-yielding queries in the same group, and increases it by a quite a large factor. Starting all such long queries at the very same time artifically increases the queue size. And puttng single dummy query at the very end of the queue is used as evidence that such queries would generally run longer in presence of long non-yielding queries. What is actually measured by this workload, is how long is the rampup when environment changes from absolutely idle to a flood of lot of "bad" long queries coming from mlutiple clients simultenously.

So if I would come up with anti-threadpool-pattern workload, this will be this: an environment, in which many different clients all suddenly start to issue simultenously long non-yielding queries, and then disconnect/sleep for long time (so that idle threads are removed again) . To finish the picture, one needs to throw couple of clients with short queries to the mix, and they need to run at the same time and the importance of the short queries to complete in short timeframe needs to be rather high.

This is pretty much your test case, though yours only had a single "burst". I would not classify that pattern as something common, I actually cannot come up with a non-artificial example of it.

IF your concern is about whether or not threadpool will be default, it is not up to me to decide about it, even if I myself would prefer this as default, at least in alpha/beta product stage (if this is *not* default, we will unlikely get lot of feedback about it). Right now, I'm using it in a feature tree, as a simple and effective way to find regressions, more effective than a single test in the test suite,- I'd like to get a sufficient level of testing before we release it.

Revision history for this message
Elena Stepanova (elenst) wrote :

Fix released with 5.5.21.

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.