Received IOException while driving IO on session GearmanJobServerSession

Bug #609300 reported by Wilton Risenhoover on 2010-07-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman Java
Medium
Eric Lambert

Bug Description

I'm using Gearman 0.13, gearman-java, and gearman-php. In this particular instance, the PHP client is calling gearman, which dispatches a Java worker. The worker tries to respond with a big string containing the data for the PHP client to parse. I get this exception:

Jul 23, 2010 8:30:52 PM org.gearman.worker.GearmanWorkerImpl work
WARNING: Received IOException while driving IO on session GearmanJobServerSession:1:GearmanNIOJobServerConnection:gearman/127.0.0.1:4730
java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(Unknown Source)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
 at sun.nio.ch.IOUtil.write(Unknown Source)
 at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
 at org.gearman.common.GearmanNIOJobServerConnection.write(GearmanNIOJobServerConnection.java:148)
 at org.gearman.common.GearmanJobServerSession.driveSessionIO(GearmanJobServerSession.java:181)
 at org.gearman.worker.GearmanWorkerImpl.work(GearmanWorkerImpl.java:193)
 at com.robotdough.gearman.RobotWorker.start(RobotWorker.java:82)
 at com.robotdough.gearman.RobotWorker.main(RobotWorker.java:122)
Exception in thread "main" java.lang.NullPointerException
 at org.gearman.common.GearmanJobServerSession.sessionHasDataToWrite(GearmanJobServerSession.java:251)
 at org.gearman.worker.GearmanWorkerImpl.work(GearmanWorkerImpl.java:158)
 at com.robotdough.gearman.RobotWorker.start(RobotWorker.java:82)
 at com.robotdough.gearman.RobotWorker.main(RobotWorker.java:122)

I think it's a size issue because it usually works -- I think this particular job is returning a large string.

Eric Lambert (elambert) wrote :

this bug should be filed against gearman-java, and now it is :-)

affects: geardb → gearman-java

Sorry about that misfiling. That happens to me a lot.

I did a very brief test with two cases, a "short" case and a "long" case.

The short case returns a string of length 18246 without problem.
The long case attempts to return a string of length 150159 but fails.

Hope that helps.

The other problem, I'm finding out, is that as each worker dies, gearman (correctly) dishes the job out to another worker, which also subsequently dies. This produces a cascading effect leaving me with zero workers at the end.

Any thoughts?

Eric Lambert (elambert) wrote :

Absolutely no worries about misfilling, the Gearman LaunchPad portal is probably a bit confusing to the un-initiated.

Thanks for following up with the details, I am sure that will be helpful in diagnosing the problem. One thing i did not notice is that based on your stacktrace, it looks like you are using version 0.02. Is that correct? If so, any reason you are not using 0.03 (not to imply that this issue is fixed 0.03, from my cursory look, my guess is this problem also exists in 0.03)?

Changed in gearman-java:
importance: Undecided → Medium
assignee: nobody → Eric Lambert (elambert)
Eric Lambert (elambert) wrote :

BTW, when I said it looked like you were using version 0.02, I mean version 0.02 of gearman-java.

Yeah, the Gearman Job Server will re-submit any job that was running on
worker that has lost connectivity with the server, so this is expected
behavior (well, the re-submission is expected). But the problem here is
that sense the java worker will always crash when running that job, as
you point out, it results in all your workers going down.

Until this gets fixed, one possible workaround would be to have the
worker store the string somewhere where it can be retrieved and then
have the worker return to a URL/path to where the string can be picked
up. This is the generally accepted practice for workers that generate
large results sets, although I would be hard pressed to call 150K a
large result set.

Wilton Risenhoover wrote:
> The other problem, I'm finding out, is that as each worker dies, gearman
> (correctly) dishes the job out to another worker, which also
> subsequently dies. This produces a cascading effect leaving me with
> zero workers at the end.
>
> Any thoughts?
>
>

Eric Lambert (elambert) wrote :

Wilton:

Actually this might be fixed in release 0.03. I took a quick look and saw that Bug #418927 may be what is causing this. Can you try release 0.03 and let me know?

Eric

Eric Lambert (elambert) wrote :

moving comment from #418927

I upgraded the library to gearman-java 0.03, and although I'm not getting the NIO error, I'm still getting an error of some type. I'll keep poking at this to see if I can figure it out.

22:49:21,798 DEBUG (com.xxx.gearman.ScreenerAdvancedFunction.executeFunction:111) - row = [ZUMZ, ZUMIEZ INC, $18.81]
22:49:21,798 DEBUG (com.xxx.gearman.ScreenerAdvancedFunction.executeFunction:111) - row = [ZZ, SEALY CORP, $2.60]
22:49:21,798 WARN (com.xxx.gearman.ScreenerAdvancedFunction.executeFunction:140) - Passing results of length 150159 to client
Jul 23, 2010 10:49:23 PM org.gearman.worker.GearmanWorkerImpl submitFunction
WARNING: Exception while getting function results
java.lang.NullPointerException
 at org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:125)
 at org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:20)
 at org.gearman.worker.GearmanWorkerImpl.submitFunction(GearmanWorkerImpl.java:483)
 at org.gearman.worker.GearmanWorkerImpl.work(GearmanWorkerImpl.java:171)
 at com.xxx.gearman.RobotWorker.start(RobotWorker.java:82)
 at com.xxx.gearman.RobotWorker.main(RobotWorker.java:122)
Exception in thread "main" java.lang.NullPointerException
 at org.gearman.worker.GearmanWorkerImpl.submitFunction(GearmanWorkerImpl.java:506)
 at org.gearman.worker.GearmanWorkerImpl.work(GearmanWorkerImpl.java:171)
 at com.xxx.gearman.RobotWorker.start(RobotWorker.java:82)
 at com.xxx.gearman.RobotWorker.main(RobotWorker.java:122)

Eric Lambert (elambert) wrote :

Wilton, the code in your stack trace would only be invoked if the executeFunction for your job returned a Null result

        try {
            result = executeFunction();
        } catch (Exception e) {
            thrown = e;
        }
        if (result == null) {
            String message = thrown == null ? "function returned null result" :
                thrown.getMessage();
            fireEvent(new GearmanPacketImpl(GearmanPacketMagic.REQ,
                    GearmanPacketType.WORK_EXCEPTION,
                    GearmanPacketImpl.generatePacketData(jobHandle,
                    message.getBytes())));
...
Is your executeFunction correctly returning a GearmanJobResult?

Eric Lambert (elambert) wrote :

Actually, what I said below is not entirely correct, that code would be
executed if a null result were returned or an exception was encountered
Eric Lambert wrote:
> Wilton, the code in your stack trace would only be invoked if the
> executeFunction for your job returned a Null result
>
> try {
> result = executeFunction();
> } catch (Exception e) {
> thrown = e;
> }
> if (result == null) {
> String message = thrown == null ? "function returned null result" :
> thrown.getMessage();
> fireEvent(new GearmanPacketImpl(GearmanPacketMagic.REQ,
> GearmanPacketType.WORK_EXCEPTION,
> GearmanPacketImpl.generatePacketData(jobHandle,
> message.getBytes())));
> ...
> Is your executeFunction correctly returning a GearmanJobResult?
>
>

I will make sure I've wrapped all the functions with a try/catch.

Ok it's fixed. You were right, there was something else going on within one of the functions that was throwing the exception, but it was nearly impossible to figure out which one. I finally downloaded the source and stuck a e.printStackTrace() in the try/catch that you showed above, which led me to the culprit.

I can also confirm that the long string test is working fine now, so this is resolved as far as I'm concerned.

Thanks for the help!

Eric Lambert (elambert) wrote :

marking as invalid as this is a duplicate of #418927

Changed in gearman-java:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers