Comment 3 for bug 909774

Revision history for this message
Elena Stepanova (elenst) wrote :

>> the example looks like specifically crafted thread pooling anti-pattern

The initial description consisted of two parts. The first part was to show that even with the minimal deviation from the perfect flow the impact of the new (presumably default) configuration can be noticed by a live user. The second case was driven from it to some extreme to show that it can lead not only to performance problems, but to the loss of functionality. I agree that the second part might be not the best example, it was just a fast one. I think such patterns might exist in real life because they are easy to create, but will probably be rare; so lets ignore it.

I find the initial scenario in itself worrisome.

The article above bases the "job in life is to ensure that there is one unblocked thread executing..." statement on a somewhat arguable assumption: "since all threads are SQL spawned, they are "well-behaved" and include code that prevents them from monopolizing the system". I don't know if it's true for the SQL server in question, but in our case long non-yielding queries do happen. It's quite normal to expect that these queries might suffer some performance loss; but the first example shows that in fact _other_ queries are affected. Even with only two long queries running at the same time, the delay for unrelated simple short queries is percievable; and with 5-10 long queries, the delay for others might be seriously annoying.

If we let it be the default behavior, what we are likely to observe is that after an upgrade users will start complaining that "every now and then a simple query might hang for 5-10 seconds". Thinking about widespread real-life setups (web applications, virtual hosting, etc.), in many cases the schema owner might have no way whatsoever to avoid or even investigate that, since "bad" long queries might be happening in a different schema on the same server. Since long queries don't necessarily cause general system overload, monitors will show nothing suspicious, so the hoster admins will have a problem investigating it too, and the conclusion is likely to be "the server is just slow at times". It is a bad reputation that spreads fast and is hard to counterweigh with nice benchmark results.

I will try the fix to see how it works now, but in general my opinion is that it makes sense to disable the new behavior by default. People who really care about performance on the level of switching contexts don't run their servers with default parameters anyway -- they do fine-tuning. If they set thread pooling in their configuration manually, they will at least know what they changed if something goes wrong; while the users who only care whether their queries run 1 second or 5 won't get a new problem.