NullPointerException thrown from GearmanClientImpl.shutdownNow

Bug #689330 reported by Omry Yadan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman Java
Fix Released
Medium
Eric Lambert

Bug Description

While stress testing my gearman worker, I bumped into the following exceptions:

1. RejectedExecutionException with the message "No servers registered with client"
if I catch this one and try to close the client in order to start a new one and retry, I get:

2.java.lang.NullPointerException
    at org.gearman.client.GearmanClientImpl.shutdownNow(GearmanClientImpl.java:482)
    at org.gearman.client.GearmanClientImpl.shutdown(GearmanClientImpl.java:454)

Revision history for this message
Omry Yadan (omry) wrote :

for the record: this happens when I try to reuse the GearmanClientImpl between multiple threads (only one thread is accessing it at a time) using a basic connection pool.
also - I only see it under heavy load.

Eric Lambert (elambert)
Changed in gearman-java:
assignee: nobody → Eric Lambert (elambert)
Revision history for this message
Eric Lambert (elambert) wrote :

Hi Omry:

Thanks for the bug report, I'll take a look. In the mean time can you let me know what version of the Gearman Java library you were using (0.04 is the latest). Also, if you have a snippet of code that reproduces the issue which you can share, that would be appreciated (or if not, if you can give me a description of what the threads are doing).

Thanks!

Eric

Revision history for this message
Omry Yadan (omry) wrote :
Download full text (4.5 KiB)

Hi Eric,
Thanks for the quick reply.
Java lib version is 0.04, server version is 0.11 if that matters.

here is the client code, the handle function is called for each http request, and it submits a job to gearmand.
below is also the connection pool class (maintains live clients and hand them over to threads).

The problems only appears when I many active clients. I am testing with 'siege' which is a simple http stress tester.
I see problems at 100 or more concurrent clients (there are enough workers to handle this load, spread across 16 machines).

    static class CropHandler implements Handler
    {
        private final GearmanConnectionPool m_gcp;

        public CropHandler(GearmanConnectionPool gcp)
        {
            m_gcp = gcp;
        }

        @Override
        public void handle(String target, HttpServletRequest request, HttpServletResponse response) throws Exception
        {
            String url = request.getParameter("url");
            int x = Util.toInt(request.getParameter("x"), 0);
            int y = Util.toInt(request.getParameter("y"), 0);
            int w = Util.toInt(request.getParameter("width"), 0);
            int h = Util.toInt(request.getParameter("height"), 0);
            CropResponse res = null;
            for(int i=0;i<3;i++)
            {
                GearmanClient gc = null;
                try
                {
                    gc = m_gcp.get();
                    res = getResponse(url, x, y, w, h, gc);
                    break;
                }
                catch(RejectedExecutionException e1)
                {
                    logger.warn("IOException getting crop_image response, retry ("+i+")" + e1.getMessage());
                    gc.shutdown();
                    gc = null;
                }
                catch(IOException e)
                {
                    logger.warn("IOException getting crop_image response, retry ("+i+")" + e.getMessage());
                    gc.shutdownNow();
                    gc = null;
                }
                finally
                {
                    if (gc != null)
                        m_gcp.release(gc);
                }
            }

            if (res == null)
            {
                String msg = "Failed to obtain crop_image response";
                logger.warn(msg);
                response.sendError(500, msg);
            }
            else
            {
                if (res.hasError())
                {
                    String msg = res.getError().getType() + " : " + res.getError().getMessage();
                    logger.warn(msg);
                    response.sendError(500, msg);
                }
                else
                {
                    response.setContentType("image/jpeg");
                    response.getOutputStream().write(res.getImage().toByteArray());
                }
            }

            ((Request) request).setHandled(true);
        }

        private CropResponse getResponse(String url, int x, int y, int w, int h, GearmanClient gc) throws InvalidProtocolBufferException, IOException, InterruptedException, ExecutionException
        {
            CropRequest req = CropRe...

Read more...

Revision history for this message
Omry Yadan (omry) wrote :

Hi Eric,
After further analysis, I am pretty certain all of the problems were caused by "Too many open files" error.

Revision history for this message
Eric Lambert (elambert) wrote : Re: [Bug 689330] Re: NullPointerException thrown from GearmanClientImpl.shutdownNow

  Hi Omry:

Thanks for the code snippet. I took a very quick look, I do see some
potential issues with how the gearman client deals with a multithreaded
environment.

Cheers!

Eric

On 12/13/2010 06:39 AM, Omry Yadan wrote:
> Hi Eric,
> After further analysis, I am pretty certain all of the problems were caused by "Too many open files" error.
>

Revision history for this message
Omry Yadan (omry) wrote :

Hi Eric,
Can you elaborate?

Revision history for this message
Eric Lambert (elambert) wrote :

Well one thing I did notice was that was using HashMap to hold the sessions and HashMap is not thread safe, but not sure that is the cause.

Eric

On Dec 14, 2010, at 12:58 PM, Omry Yadan <email address hidden> wrote:

> Hi Eric,
> Can you elaborate?
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/689330
>
> Title:
> NullPointerException thrown from GearmanClientImpl.shutdownNow
>
> Status in Gearman Java:
> New
>
> Bug description:
> While stress testing my gearman worker, I bumped into the following exceptions:
>
> 1. RejectedExecutionException with the message "No servers registered with client"
> if I catch this one and try to close the client in order to start a new one and retry, I get:
>
> 2.java.lang.NullPointerException
> at org.gearman.client.GearmanClientImpl.shutdownNow(GearmanClientImpl.java:482)
> at org.gearman.client.GearmanClientImpl.shutdown(GearmanClientImpl.java:454)
>
>

Revision history for this message
Omry Yadan (omry) wrote :

Ah, that's not a real issue:
you will notice that the two functions that are accessing the map are synchronized (in fact, using a synchronized data structure here would have not been enough).

I think you can close this bug.
you can try to make the lib behave better when it fails to open new files/sockets (due to the max open files limit), but this is not hugely important. (IE : give more specific errors, not NPE or "No servers are registered").

Eric Lambert (elambert)
Changed in gearman-java:
importance: Undecided → Medium
milestone: none → 0.05
Revision history for this message
Eric Lambert (elambert) wrote :

A little unclear on how this NPE could have arisen. Possible root causes are that either a NULL object got inserted into the map used to track sessions or perhaps concurrent modification of the map (which seems unlikely if only one thread is access the client). Will change the collection used to track JobServerSessions from a HashMap to a Hashtable (which does not allow nulls and protects against concurrent modification).

Eric Lambert (elambert)
Changed in gearman-java:
status: New → In Progress
Revision history for this message
Eric Lambert (elambert) wrote :

Omry:

I checked a change into the trunk which may or may not address this (hows that for being definitive). Any chance you could try the new code out and let me know if you encounter this. (Also, FWIW, I noticed that your GearmanConnectionPool.get() method ignores the return code from addJobServer. If you are hitting the "too many files open" issue, the addJobServer() should return false)

Eric

Revision history for this message
Omry Yadan (omry) wrote :

Hi Eric,
I am a bit swamped with other stuff at the moment.
I`ll try to reproduce when I get some time.

Revision history for this message
Eric Lambert (elambert) wrote :

Omry:

Absolutely no hurry. When ever you get a chance is fine by me :-)

Eric

On Jan 13, 2011, at 6:12 AM, Omry Yadan <email address hidden> wrote:

> Hi Eric,
> I am a bit swamped with other stuff at the moment.
> I`ll try to reproduce when I get some time.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/689330
>
> Title:
> NullPointerException thrown from GearmanClientImpl.shutdownNow
>
> Status in Gearman Java:
> In Progress
>
> Bug description:
> While stress testing my gearman worker, I bumped into the following
> exceptions:
>
> 1. RejectedExecutionException with the message "No servers registered with client"
> if I catch this one and try to close the client in order to start a new one and retry, I get:
>
> 2.java.lang.NullPointerException
> at org.gearman.client.GearmanClientImpl.shutdownNow(GearmanClientImpl.java:482)
> at org.gearman.client.GearmanClientImpl.shutdown(GearmanClientImpl.java:454)
>
>

Revision history for this message
Omry Yadan (omry) wrote :

you updated my reconnect bug branch as well (do you need to at all?)
if this is your diff:

- sessionsMap = new HashMap<SelectionKey, GearmanJobServerSession>();
+ sessionsMap = new Hashtable<SelectionKey, GearmanJobServerSession>();

that it appears to have worked.
no more npes from the lib when you get too many open files.

Revision history for this message
Eric Lambert (elambert) wrote :

Great, thanks for checking

On Jan 17, 2011, at 2:18 AM, Omry Yadan <email address hidden> wrote:

> you updated my reconnect bug branch as well (do you need to at all?)
> if this is your diff:
>
> - sessionsMap = new HashMap<SelectionKey, GearmanJobServerSession>();
> + sessionsMap = new Hashtable<SelectionKey, GearmanJobServerSession>();
>
> that it appears to have worked.
> no more npes from the lib when you get too many open files.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/689330
>
> Title:
> NullPointerException thrown from GearmanClientImpl.shutdownNow
>
> Status in Gearman Java:
> In Progress
>
> Bug description:
> While stress testing my gearman worker, I bumped into the following
> exceptions:
>
> 1. RejectedExecutionException with the message "No servers registered with client"
> if I catch this one and try to close the client in order to start a new one and retry, I get:
>
> 2.java.lang.NullPointerException
> at org.gearman.client.GearmanClientImpl.shutdownNow(GearmanClientImpl.java:482)
> at org.gearman.client.GearmanClientImpl.shutdown(GearmanClientImpl.java:454)
>
>

Revision history for this message
Omry Yadan (omry) wrote :

I think you can safely close this one.

btw:
did you merge my reconnect-bug branch to trunk yet?

Revision history for this message
Eric Lambert (elambert) wrote :

  On 01/19/2011 01:24 AM, Omry Yadan wrote:
> I think you can safely close this one.
>
> btw:
> did you merge my reconnect-bug branch to trunk yet?
>
Sorry, have not merged yet, will do so soon.

Eric Lambert (elambert)
Changed in gearman-java:
status: In Progress → Fix Committed
Eric Lambert (elambert)
Changed in gearman-java:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.