Bad behavior when mysqld returns TOO MANY CONNECTIONS error

Bug #503370 reported by Omry Yadan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
BoneCP
Fix Released
Low
Wallace Wadge

Bug Description

looks like there is a bug when mysql server returns "Too many connections" error (error code 1203 or 1040).

when it happens, it looks like bonecp background thread is constantly trying to create new connections, and failing - resulting in this log output:

13:47:41.424 bonecp.PoolWatchThread [TinyCP-pool-watch-thread] ERROR - com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Too many connections
13:47:41.425 bonecp.PoolWatchThread [TinyCP-pool-watch-thread] ERROR - com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Too many connections
13:47:41.426 bonecp.PoolWatchThread [TinyCP-pool-watch-thread] ERROR - com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Too many connections

BTW:
I tried to registered to the bonecp forum, but apparently I am a bot, because I was not able to pass the captcha test.

Revision history for this message
Wallace Wadge (wwadge) wrote :

This is a sign that you have asked BoneCP to create more connections then your mysql server is configured to accept. Either increase the number of connections mysql will allow (via mysql administrator) or tell bonecp to create fewer connections.

By default MySQL allows you to create just a few connections so I would start with that first.

P.S. Sorry about the captcha test - the amount of spambot attempts I get hit with is unbelievable; I'm going to turn it down a notch.

Changed in bonecp:
status: New → Invalid
Revision history for this message
Omry Yadan (omry) wrote : Re: [Bug 503370] Re: Bad behavior when mysqld returns TOO MANY CONNECTIONS error

I am attempting to switch over from a different connection pool (Proxool).
with Proxool, when mysql ran out of connections, my db later code
received an exception when I got a new connection from the pool.
I then checked if the exception is 'too many connections' and slept a
little and retried a few times before raising the exception to the
application layer.

I don't want to increase the number of connections in mysql beyond what
it's set now (there is a server overhead to each connection, and too
many open connections can kill a server).

Are you saying that bonecp looping in trying to open a new connection
(and eating 100% cpu in the process) is not a bug?

On 01/05/2010 04:23 PM, Wallace Wadge wrote:
> This is a sign that you have asked BoneCP to create more connections
> then your mysql server is configured to accept. Either increase the
> number of connections mysql will allow (via mysql administrator) or tell
> bonecp to create fewer connections.
>
> By default MySQL allows you to create just a few connections so I would
> start with that first.
>
> P.S. Sorry about the captcha test - the amount of spambot attempts I get
> hit with is unbelievable; I'm going to turn it down a notch.
>
> ** Changed in: bonecp
> Status: New => Invalid
>
>

Revision history for this message
Wallace Wadge (wwadge) wrote :

What I'm saying is that the pool should be configured to never enter that state in the first place by setting the pool to not create any more connections that the DB will allow.

Proxool throws an exception when it cannot obtain a connection. This sucks because it means you've just wasted CPU time in handling that exception for nothing. You say that in your case you "wait a little and then retry", so why not simply block and wait until a connection is available? This is the default behaviour of BoneCP (and C3P0 and DBCP) and perhaps if there's a good use case I will add an option to throw an exception to emulate the behaviour of Proxool too, however it's unclear what the pool should do at that stage (block? try again? sleep? throw an exception? ignore it?)

P.S. Can you paste me your configuration settings?

Changed in bonecp:
status: Invalid → New
Revision history for this message
Omry Yadan (omry) wrote :

This is not always possible to prevent this using configuration:
I have a scenario of batch jobs running and connection to many database
servers.
lets say each job is running for two minutes, and connect to 5 random
servers out of 10.
in rare cases, all jobs will connect to the same server, and mysql will
bitch that it ran out of connections.
in such cases, the wise move is to just wait a little and try again
(increasing the max connections in each mysql server to the worse case
scenario may mean that in the worse case the server will run out of
memory trying to handle too many concurrent connections).

in short - in such a scenario it's not possible to ensure that there is
never a case where mysql run out of connections.

proxool allow me to handle this case myself, and fail the job if it
can't get a connection at all (which is good, because then I will notice
something is wrong and do something about it).
naturally such logic does belong to the connection pool layer. but
proxool is pretty dead and does not even compile with a modern JDK (one
of the reasons for switching), so I chose to patch it in my own db layer
above it.
if it will be possible to set the policy for how to handle too many
connections errors (configuration or plugable handler) it will be ideal.

it's easy to reproduce the problem, just make sure bonecp creates more
connections on startup than mysql allows.
you can check mysql max_connections with
show variables like '%max_conn%';

and set it using
set global max_connections=15;

On 01/05/2010 05:09 PM, Wallace Wadge wrote:
> What I'm saying is that the pool should be configured to never enter
> that state in the first place by setting the pool to not create any more
> connections that the DB will allow.
>
> Proxool throws an exception when it cannot obtain a connection. This
> sucks because it means you've just wasted CPU time in handling that
> exception for nothing. You say that in your case you "wait a little and
> then retry", so why not simply block and wait until a connection is
> available? This is the default behaviour of BoneCP (and C3P0 and DBCP)
> and perhaps if there's a good use case I will add an option to throw an
> exception to emulate the behaviour of Proxool too, however it's unclear
> what the pool should do at that stage (block? try again? sleep? throw an
> exception? ignore it?)
>
>
> P.S. Can you paste me your configuration settings?
>
>
>
> ** Changed in: bonecp
> Status: Invalid => New
>
>

Revision history for this message
Wallace Wadge (wwadge) wrote :

ok I think a pluggable handler would be simple to implement here (I already have handler support so I'll just add a new one).

Changed in bonecp:
status: New → In Progress
importance: Undecided → Low
assignee: nobody → Wallace Wadge (wwadge)
Revision history for this message
Omry Yadan (omry) wrote :

cool..
what do you have in mind?
for it to be useful it should allow me to throw an exception to the code
calling pool.getConnection() in some cases.

Revision history for this message
Wallace Wadge (wwadge) wrote :

What I have in mind (actually it's done already, but writing some more unit tests first) is to have a callback function like the other hook methods (onAcquire, onCheckin, etc).

You simply register a method that will be called back:

boolean onAcquireFail(Throwable t){
    // something failed, let me wait for a while....
   Thread.sleep(1000);
   return true; // saying return true means go try again, return false will throw an exception
}

The exception itself is not thrown in pool.getConnection() since connection creation is actually done in a separate thread in an asynchronous fashion. Therefore to summarise:

- pool.getConnection() will never throw a "too many connections" exception. It either returns a connection or waits for one to become available
- Callback function allows you to retry a failure or log the exception or whatever from whichever place the request was made.

Revision history for this message
Omry Yadan (omry) wrote :

What is the meaning of returning false, if it throws an exception in the
background thread?
I will have to think about it to see if it's good enough for my needs.
but thanks for the quick fix in any case :).

Revision history for this message
Wallace Wadge (wwadge) wrote :

returning false means: take the usual course of action.

For example suppose the database goes down (a lost network link for example). You get a chance to do something about it, decide to print out a message, return true for the first three times to retry then give up and return false to let the pool to proceed as usual by terminating all connections.

Revision history for this message
Omry Yadan (omry) wrote :

sounds good.
did you change the usual course of action for this error?
because with the current release, it's to try again in a tight loop.

Revision history for this message
Wallace Wadge (wwadge) wrote :

Committed in v0.6.2.

It's on the download server already; updating the main site shortly.

Changed in bonecp:
status: In Progress → Fix Released
Revision history for this message
Omry Yadan (omry) wrote : Re: [Bug 503370] Re: Bad behavior when mysqld returns TOO MANY CONNECTIONS error

Tested the new logic with 0.62, and it's still have the same problem:
after returning false, bonecp gets into a tight loop eating 100% cpu
instead of giving up and aborting all connections.

             config.setConnectionHook(new AbstractConnectionHook()
             {
                 int numTooManyConnectionErrors = 0;

                 @Override
                 public boolean onAcquireFail(Throwable t)
                 {
                     if (t instanceof SQLException)
                     {
                         SQLException e = (SQLException) t;
                         if (e.getErrorCode() ==
MysqlErrorNumbers.ER_TOO_MANY_USER_CONNECTIONS || e.getErrorCode() ==
MysqlErrorNumbers.ER_CON_COUNT_ERROR) // too many connections
                         {
                             if (numTooManyConnectionErrors++ < 3)
                             {
                                 logger.warn("Too many connections
error, will try again after sleeping for a while");
                                 try
                                 {
                                     Thread.sleep(1000);
                                 }
                                 catch (InterruptedException e1)
                                 {
                                 }
                                 return true;
                             }
                             else
                             {
                                 logger.warn("Too many connections
error, aborting connection pool");
                             }
                         }
                     }
                     return false;
                 }

                 @Override
                 public void onAcquire(ConnectionHandle connection)
                 {
                     numTooManyConnectionErrors = 0;
                 }
             });

Revision history for this message
Wallace Wadge (wwadge) wrote :

Looking at your code:

1. First of all make sure that your code is thread-safe since it may be called in a re-entrant fashion (two partitions might call your method at the same time so lock numTooManyConnectionErrors).

2. "Aborting" all connections is not the right flow here. That error is being triggered when you go fetch a connection, it succeeds but realises it is running low, it then instructs *another* thread to create new connections. If that thread fails, nothing's bad right away since there are still some more connections in the pool.

In your case it would seem you want to keep waiting and trying to obtain another connection forever so why not simply keep returning true (after a fixed delay) in the onAcquireFail method?

Meanwhile I will add some more logic to add an automatic delay when this scenario happens to avoid spinning in the default case.

Revision history for this message
Omry Yadan (omry) wrote : Re: [Bug 503370] Re: Bad behavior when mysqld returns TOO MANY CONNECTIONS error

I suggest that if the code is to be thread safe - you do the locking
before calling it.
for example:
             } catch (Throwable t) {
                 // call the hook, if available.
                 if (this.connectionHook != null){
* synchronized(this.connectionHook){
* tryAgain =
this.connectionHook.onAcquireFail(t);
* }
* }
                 ...

(and of course in all other callbacks).

It is possible to try indefinitely for some applications, but for a
batch system - it's wise to give up at some point and raise an error to
the application level (calling getConnection() ).
at least if there are no free connections and the pool was not able to
create new ones for a time.
otherwise I may never know that there is a serious problem with my setup.

from what I can tell, right now there is no real difference between
returning true or false in this scenario.
if I return true, you try again.
if I return false, you also try again (with a small delay after you make
your changes).

Revision history for this message
Wallace Wadge (wwadge) wrote :

synchronizing externally would mean a general slowdown in the critical path (getconnection, release connection). Someone wanting just a simple log message for example will have to endure a synchronization cost.

Re timing out the connection, have a look at getAsyncConnection() which will return a Future<Connection> instead. That allows you to do something else in the meantime and call get with a timeout too if you want. Remember that getConnection does not have a problem here, it will return a perfectly valid connection. It's the other thread that's attempting to create new connections that is having a problem and causing an exception.

The difference between returning true and false is:

return true: your application will handle the error
return false: let the pool decide what's best (eg by flagging the connection as broken, terminating the pool, etc)

Revision history for this message
Wallace Wadge (wwadge) wrote :

Omry,

Can you test out with v0.6.3-rc1?

I added a new option in the config: config.setAcquireRetryDelay(..) which should help reduce the problem you were seeing.

eg config.setAcquireRetryDelay(1000) will wait for 1 second after each acquire failure before trying again.

Revision history for this message
Omry Yadan (omry) wrote :

Wallace Wadge wrote:

> Omry,
>
> Can you test out with v0.6.3-rc1?
>
> I added a new option in the config: config.setAcquireRetryDelay(..)
> which should help reduce the problem you were seeing.
>
>
> eg config.setAcquireRetryDelay(1000) will wait for 1 second after each acquire failure before trying again.
>
>
Yes.
hopefully on sunday.
I also intend to reply to your last ticket comment.

Revision history for this message
Omry Yadan (omry) wrote :

I don't see 0.6.3 anywhere on the site.

maybe it's time for a public VCS?

Revision history for this message
Wallace Wadge (wwadge) wrote :
Revision history for this message
Omry Yadan (omry) wrote :

How do I build it?
I tried mvn install, but it failed all the tests:

Running com.jolbox.bonecp.TestConnectionHandle
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.037
sec <<< FAILURE!
Running com.jolbox.bonecp.hooks.TestConnectionHook
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015
sec <<< FAILURE!
...

Revision history for this message
Wallace Wadge (wwadge) wrote :

Is your maven setup correctly? You should be building on the topmost project and be able to pull in the 3rd party jars.

I just tried again and it built with no problems:

> git clone git://github.com/wwadge/bonecp.git
Initialized empty Git repository in D:/Temp/github/bonecp/.git/
remote: Counting objects: 869, done.
remote: Compressing objects: 12% (26/21Receiving objects: 7% (61/869), 28.00
remote: Compressing objects: Receiving objects: 11% (96/869), 28.00 KiB | 31 K
remote: Compressing objects: 22% (47/211) Receiving objects: 12% (105/869), 2
remote: Compressing objects: 100% (211/211), done. 00 KiB | 31 KiB/s
remote: Total 869 (delta 400), reused 869 (delta 400)
Receiving objects: 100% (869/869), 206.22 KiB | 51 KiB/s, done.
Resolving deltas: 100% (400/400), done.

> D:\Temp\github>cd bonecp

> D:\Temp\github\bonecp>mvn test
(blah blah blah)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] ------------------------------------------------------------------------
[INFO] Bone Connection Pool - Parent ......................... SUCCESS [2.047s]
[INFO] BoneCP Core Library ................................... SUCCESS [13.485s]

[INFO] BoneCP Hibernate provider ............................. SUCCESS [3.297s]
[INFO] BoneCP Benchmark ...................................... SUCCESS [2.266s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 22 seconds
[INFO] Finished at: Mon Jan 11 14:31:57 CET 2010
[INFO] Final Memory: 21M/38M
[INFO] ------------------------------------------------------------------------

Revision history for this message
Omry Yadan (omry) wrote :

overall looks good (I updated the code now, so I may have got additional changes).
one thing:

    private void fillConnections(int connectionsToCreate) throws InterruptedException {
        try {
            for (int i=0; i < connectionsToCreate; i++){
                this.partition.addFreeConnection(new ConnectionHandle(this.partition.getUrl(), this.partition.getUsername(), this.partition.getPassword(), this.pool));
            }
        } catch (SQLException e) {
            logger.error("Error in trying to obtain a connection. Retrying in "+this.acquireRetryDelay+"ms", e);
            Thread.sleep(this.acquireRetryDelay);
        }
      }

prints the exception again and again.
I suggest you only print the first exception (unless the e.getErrorCode() changes

btw:
how do I get bonecp to build in Eclipse?
it depends on org.maven.ide.eclipse.MAVEN2_CLASSPATH_CONTAINER, I installed the maven2 plugin from http://maven.apache.org/eclipse-plugin.html but it didn't help.
are you using a different Eclipse plugin?

Revision history for this message
Wallace Wadge (wwadge) wrote :

Good point, I'll come up with a mini-fix for that and issue a release.

I am using this plugin:
http://m2eclipse.sonatype.org/

Maven can be a downright pain sometimes; it's not big friends with Eclipse I'm afraid.

Thanks Omry.

Revision history for this message
Omry Yadan (omry) wrote :

cool, now it compiles for me inside Eclipse as well.
with your current default behavior, I no longer need to register the
onAcquireFail hook.

Revision history for this message
Wallace Wadge (wwadge) wrote :

Issued a release with this fix. Unfortunately I forgot to put in the patch to quieten down the log when it fails (still getting used to Git!) - will issue a patch soon and keep you posted.

Also on my TODO list: I figured out a way to make the pool go even faster - watch this space :-)

Revision history for this message
Omry Yadan (omry) wrote :

cool :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.