Bad behavior when mysqld returns TOO MANY CONNECTIONS error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
BoneCP |
Fix Released
|
Low
|
Wallace Wadge |
Bug Description
looks like there is a bug when mysql server returns "Too many connections" error (error code 1203 or 1040).
when it happens, it looks like bonecp background thread is constantly trying to create new connections, and failing - resulting in this log output:
13:47:41.424 bonecp.
13:47:41.425 bonecp.
13:47:41.426 bonecp.
BTW:
I tried to registered to the bonecp forum, but apparently I am a bot, because I was not able to pass the captcha test.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #1 |
Changed in bonecp: | |
status: | New → Invalid |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : Re: [Bug 503370] Re: Bad behavior when mysqld returns TOO MANY CONNECTIONS error | #2 |
I am attempting to switch over from a different connection pool (Proxool).
with Proxool, when mysql ran out of connections, my db later code
received an exception when I got a new connection from the pool.
I then checked if the exception is 'too many connections' and slept a
little and retried a few times before raising the exception to the
application layer.
I don't want to increase the number of connections in mysql beyond what
it's set now (there is a server overhead to each connection, and too
many open connections can kill a server).
Are you saying that bonecp looping in trying to open a new connection
(and eating 100% cpu in the process) is not a bug?
On 01/05/2010 04:23 PM, Wallace Wadge wrote:
> This is a sign that you have asked BoneCP to create more connections
> then your mysql server is configured to accept. Either increase the
> number of connections mysql will allow (via mysql administrator) or tell
> bonecp to create fewer connections.
>
> By default MySQL allows you to create just a few connections so I would
> start with that first.
>
> P.S. Sorry about the captcha test - the amount of spambot attempts I get
> hit with is unbelievable; I'm going to turn it down a notch.
>
> ** Changed in: bonecp
> Status: New => Invalid
>
>
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #3 |
What I'm saying is that the pool should be configured to never enter that state in the first place by setting the pool to not create any more connections that the DB will allow.
Proxool throws an exception when it cannot obtain a connection. This sucks because it means you've just wasted CPU time in handling that exception for nothing. You say that in your case you "wait a little and then retry", so why not simply block and wait until a connection is available? This is the default behaviour of BoneCP (and C3P0 and DBCP) and perhaps if there's a good use case I will add an option to throw an exception to emulate the behaviour of Proxool too, however it's unclear what the pool should do at that stage (block? try again? sleep? throw an exception? ignore it?)
P.S. Can you paste me your configuration settings?
Changed in bonecp: | |
status: | Invalid → New |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #4 |
This is not always possible to prevent this using configuration:
I have a scenario of batch jobs running and connection to many database
servers.
lets say each job is running for two minutes, and connect to 5 random
servers out of 10.
in rare cases, all jobs will connect to the same server, and mysql will
bitch that it ran out of connections.
in such cases, the wise move is to just wait a little and try again
(increasing the max connections in each mysql server to the worse case
scenario may mean that in the worse case the server will run out of
memory trying to handle too many concurrent connections).
in short - in such a scenario it's not possible to ensure that there is
never a case where mysql run out of connections.
proxool allow me to handle this case myself, and fail the job if it
can't get a connection at all (which is good, because then I will notice
something is wrong and do something about it).
naturally such logic does belong to the connection pool layer. but
proxool is pretty dead and does not even compile with a modern JDK (one
of the reasons for switching), so I chose to patch it in my own db layer
above it.
if it will be possible to set the policy for how to handle too many
connections errors (configuration or plugable handler) it will be ideal.
it's easy to reproduce the problem, just make sure bonecp creates more
connections on startup than mysql allows.
you can check mysql max_connections with
show variables like '%max_conn%';
and set it using
set global max_connections=15;
On 01/05/2010 05:09 PM, Wallace Wadge wrote:
> What I'm saying is that the pool should be configured to never enter
> that state in the first place by setting the pool to not create any more
> connections that the DB will allow.
>
> Proxool throws an exception when it cannot obtain a connection. This
> sucks because it means you've just wasted CPU time in handling that
> exception for nothing. You say that in your case you "wait a little and
> then retry", so why not simply block and wait until a connection is
> available? This is the default behaviour of BoneCP (and C3P0 and DBCP)
> and perhaps if there's a good use case I will add an option to throw an
> exception to emulate the behaviour of Proxool too, however it's unclear
> what the pool should do at that stage (block? try again? sleep? throw an
> exception? ignore it?)
>
>
> P.S. Can you paste me your configuration settings?
>
>
>
> ** Changed in: bonecp
> Status: Invalid => New
>
>
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #5 |
ok I think a pluggable handler would be simple to implement here (I already have handler support so I'll just add a new one).
Changed in bonecp: | |
status: | New → In Progress |
importance: | Undecided → Low |
assignee: | nobody → Wallace Wadge (wwadge) |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #6 |
cool..
what do you have in mind?
for it to be useful it should allow me to throw an exception to the code
calling pool.getConnect
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #7 |
What I have in mind (actually it's done already, but writing some more unit tests first) is to have a callback function like the other hook methods (onAcquire, onCheckin, etc).
You simply register a method that will be called back:
boolean onAcquireFail(
// something failed, let me wait for a while....
Thread.
return true; // saying return true means go try again, return false will throw an exception
}
The exception itself is not thrown in pool.getConnect
- pool.getConnect
- Callback function allows you to retry a failure or log the exception or whatever from whichever place the request was made.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #8 |
What is the meaning of returning false, if it throws an exception in the
background thread?
I will have to think about it to see if it's good enough for my needs.
but thanks for the quick fix in any case :).
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #9 |
returning false means: take the usual course of action.
For example suppose the database goes down (a lost network link for example). You get a chance to do something about it, decide to print out a message, return true for the first three times to retry then give up and return false to let the pool to proceed as usual by terminating all connections.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #10 |
sounds good.
did you change the usual course of action for this error?
because with the current release, it's to try again in a tight loop.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #11 |
Committed in v0.6.2.
It's on the download server already; updating the main site shortly.
Changed in bonecp: | |
status: | In Progress → Fix Released |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : Re: [Bug 503370] Re: Bad behavior when mysqld returns TOO MANY CONNECTIONS error | #12 |
Tested the new logic with 0.62, and it's still have the same problem:
after returning false, bonecp gets into a tight loop eating 100% cpu
instead of giving up and aborting all connections.
{
{
MysqlErrorNumbe
MysqlErrorNumbe
error, will try again after sleeping for a while");
error, aborting connection pool");
}
{
}
});
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #13 |
Looking at your code:
1. First of all make sure that your code is thread-safe since it may be called in a re-entrant fashion (two partitions might call your method at the same time so lock numTooManyConne
2. "Aborting" all connections is not the right flow here. That error is being triggered when you go fetch a connection, it succeeds but realises it is running low, it then instructs *another* thread to create new connections. If that thread fails, nothing's bad right away since there are still some more connections in the pool.
In your case it would seem you want to keep waiting and trying to obtain another connection forever so why not simply keep returning true (after a fixed delay) in the onAcquireFail method?
Meanwhile I will add some more logic to add an automatic delay when this scenario happens to avoid spinning in the default case.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : Re: [Bug 503370] Re: Bad behavior when mysqld returns TOO MANY CONNECTIONS error | #14 |
I suggest that if the code is to be thread safe - you do the locking
before calling it.
for example:
} catch (Throwable t) {
// call the hook, if available.
if (this.connectio
* synchronized(
* tryAgain =
this.connection
* }
* }
(and of course in all other callbacks).
It is possible to try indefinitely for some applications, but for a
batch system - it's wise to give up at some point and raise an error to
the application level (calling getConnection() ).
at least if there are no free connections and the pool was not able to
create new ones for a time.
otherwise I may never know that there is a serious problem with my setup.
from what I can tell, right now there is no real difference between
returning true or false in this scenario.
if I return true, you try again.
if I return false, you also try again (with a small delay after you make
your changes).
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #15 |
synchronizing externally would mean a general slowdown in the critical path (getconnection, release connection). Someone wanting just a simple log message for example will have to endure a synchronization cost.
Re timing out the connection, have a look at getAsyncConnect
The difference between returning true and false is:
return true: your application will handle the error
return false: let the pool decide what's best (eg by flagging the connection as broken, terminating the pool, etc)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #16 |
Omry,
Can you test out with v0.6.3-rc1?
I added a new option in the config: config.
eg config.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #17 |
Wallace Wadge wrote:
> Omry,
>
> Can you test out with v0.6.3-rc1?
>
> I added a new option in the config: config.
> which should help reduce the problem you were seeing.
>
>
> eg config.
>
>
Yes.
hopefully on sunday.
I also intend to reply to your last ticket comment.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #18 |
I don't see 0.6.3 anywhere on the site.
maybe it's time for a public VCS?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #19 |
Public VCS: http://
it's v0.6.3-rc1.
Jar available here: http://
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #20 |
How do I build it?
I tried mvn install, but it failed all the tests:
Running com.jolbox.
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.037
sec <<< FAILURE!
Running com.jolbox.
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015
sec <<< FAILURE!
...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #21 |
Is your maven setup correctly? You should be building on the topmost project and be able to pull in the 3rd party jars.
I just tried again and it built with no problems:
> git clone git://github.
Initialized empty Git repository in D:/Temp/
remote: Counting objects: 869, done.
remote: Compressing objects: 12% (26/21Receiving objects: 7% (61/869), 28.00
remote: Compressing objects: Receiving objects: 11% (96/869), 28.00 KiB | 31 K
remote: Compressing objects: 22% (47/211) Receiving objects: 12% (105/869), 2
remote: Compressing objects: 100% (211/211), done. 00 KiB | 31 KiB/s
remote: Total 869 (delta 400), reused 869 (delta 400)
Receiving objects: 100% (869/869), 206.22 KiB | 51 KiB/s, done.
Resolving deltas: 100% (400/400), done.
> D:\Temp\github>cd bonecp
> D:\Temp\
(blah blah blah)
[INFO] -------
[INFO] Reactor Summary:
[INFO] -------
[INFO] Bone Connection Pool - Parent .......
[INFO] BoneCP Core Library .......
[INFO] BoneCP Hibernate provider .......
[INFO] BoneCP Benchmark .......
[INFO] -------
[INFO] -------
[INFO] BUILD SUCCESSFUL
[INFO] -------
[INFO] Total time: 22 seconds
[INFO] Finished at: Mon Jan 11 14:31:57 CET 2010
[INFO] Final Memory: 21M/38M
[INFO] -------
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #22 |
overall looks good (I updated the code now, so I may have got additional changes).
one thing:
private void fillConnections(int connectionsToCr
try {
for (int i=0; i < connectionsToCr
}
} catch (SQLException e) {
}
}
prints the exception again and again.
I suggest you only print the first exception (unless the e.getErrorCode() changes
btw:
how do I get bonecp to build in Eclipse?
it depends on org.maven.
are you using a different Eclipse plugin?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #23 |
Good point, I'll come up with a mini-fix for that and issue a release.
I am using this plugin:
http://
Maven can be a downright pain sometimes; it's not big friends with Eclipse I'm afraid.
Thanks Omry.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #24 |
cool, now it compiles for me inside Eclipse as well.
with your current default behavior, I no longer need to register the
onAcquireFail hook.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Wallace Wadge (wwadge) wrote : | #25 |
Issued a release with this fix. Unfortunately I forgot to put in the patch to quieten down the log when it fails (still getting used to Git!) - will issue a patch soon and keep you posted.
Also on my TODO list: I figured out a way to make the pool go even faster - watch this space :-)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Omry Yadan (omry) wrote : | #26 |
cool :)
This is a sign that you have asked BoneCP to create more connections then your mysql server is configured to accept. Either increase the number of connections mysql will allow (via mysql administrator) or tell bonecp to create fewer connections.
By default MySQL allows you to create just a few connections so I would start with that first.
P.S. Sorry about the captcha test - the amount of spambot attempts I get hit with is unbelievable; I'm going to turn it down a notch.