calling worker.stop() causes last WORK_COMPLETE packet not sent

Bug #1007322 reported by Darren on 2012-06-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman Java
Medium
Eric Lambert

Bug Description

In a worker runner if a shutdown request is fired like this (possibly in another thread) :

if(worker.isRunning) {
    worker.stop();
}

a job is assigned and executed successfully, but the WORK_COMPLETE packet is not
sent, causing the same job assigned to another worker again.

logs from gearmand:

 INFO Starting up
 INFO Listening on :::4730 (6)
 INFO Creating wakeup pipe
 INFO Creating IO thread wakeup pipe
 INFO Adding event for listening socket (6)
 INFO Adding event for wakeup pipe
 INFO Entering main event loop
 INFO Accepted connection from ::804:191d:0:0%488178680:33812
 INFO [ 0] ::804:191d:0:0%488178680:33812 Connected
 INFO [ 0] ::804:191d:0:0%488178680:33812 Disconnected
 INFO Accepted connection from ::804:191d:0:0%488178680:33814
 INFO [ 0] ::804:191d:0:0%488178680:33814 Connected
 INFO [ 0] ::804:191d:0:0%488178680:33814 Disconnected
 INFO Accepted connection from ::804:191d:0:0%488178680:33817
 INFO [ 0] ::804:191d:0:0%488178680:33817 Connected
 INFO [ 0] ::804:191d:0:0%488178680:33817 Disconnected
 INFO Received SHUTDOWN wakeup event
 INFO Clearing event for listening socket (6)
 INFO Clearing event for wakeup pipe
 INFO [ 0] Received SHUTDOWN wakeup event
 INFO [ 0] Clearing event for IO thread wakeup pipe
 INFO Exited main event loop
 INFO Closing listening socket (6)
 INFO Closing wakeup pipe
 INFO Closing IO thread wakeup pipe
 INFO Shutdown complete
[root@webprt-lvs01 autorun]# gearmand -u gearmand -vvvvv
 INFO Starting up
DEBUG Initializing libevent for main thread
DEBUG Method for libevent: epoll
DEBUG Trying to listen on :::4730
 INFO Listening on :::4730 (6)
DEBUG Trying to listen on 0.0.0.0:4730
 INFO Creating wakeup pipe
DEBUG Creating 0 threads
 INFO Creating IO thread wakeup pipe
 INFO Adding event for listening socket (6)
 INFO Adding event for wakeup pipe
 INFO Entering main event loop
 INFO Accepted connection from ::4854:af06:0:0%112153656:33824
CRAZY [ 0] ::4854:af06:0:0%112153656:33824 Watching POLLIN
 INFO [ 0] ::4854:af06:0:0%112153656:33824 Connected
CRAZY [ 0] ::4854:af06:0:0%112153656:33824 Ready POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33824 Received SUBMIT_JOB_BG
CRAZY [ 0] ::4854:af06:0:0%112153656:33824 Watching POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33824 Sent JOB_CREATED
CRAZY [ 0] ::4854:af06:0:0%112153656:33824 Ready POLLIN
 INFO [ 0] ::4854:af06:0:0%112153656:33824 Disconnected
 INFO Accepted connection from ::4854:af06:0:0%112153656:33827
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Watching POLLIN
 INFO [ 0] ::4854:af06:0:0%112153656:33827 Connected
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Ready POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Received SET_CLIENT_ID
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Watching POLLIN
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Ready POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Received GRAB_JOB
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Watching POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Sent NO_JOB
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Ready POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Received CAN_DO
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Watching POLLIN
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Ready POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Received PRE_SLEEP
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Watching POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Sent NOOP
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Ready POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Received GRAB_JOB
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Watching POLLIN
DEBUG [ 0] ::4854:af06:0:0%112153656:33827 Sent JOB_ASSIGN
CRAZY [ 0] ::4854:af06:0:0%112153656:33827 Ready POLLIN
 INFO [ 0] ::4854:af06:0:0%112153656:33827 Disconnected

possilbe fix:

=== modified file 'src/main/java/org/gearman/worker/GearmanWorkerImpl.java'
--- src/main/java/org/gearman/worker/GearmanWorkerImpl.java 2012-05-09 06:45:13 +0000
+++ src/main/java/org/gearman/worker/GearmanWorkerImpl.java 2012-06-01 08:31:54 +0000
@@ -224,6 +224,7 @@
                     if (!functionList.isEmpty()) {
                         GearmanFunction fun = functionList.remove();
                         submitFunction(fun);
+ sess.driveSessionIO();
                     }
                 } catch (IOException ioe) {
                     LOG.warn("Received IOException while driving" +

Eric Lambert (elambert) wrote :

Hi Darren:

Thanks for the bug report, but I dont I understand what the problem is you are seeing.

Is your situation like this:

1-Job is submitted to worker
2-Worker starts execution of job
3-Worker is stopped while Job is still running

If that is the case, then the correct behavior would be what you are seeing. We should only send a job_completed packet if the job has finished. Stopping the worker does not mean the jobs itself has finished.

Or is the situation you are seing like the following:

1-Job is submitted to worker
2-Worker starts execution of job
3-Job completes
4-Worker is stopped

Darren (darren-hoo) wrote :

Hi Eric,

The first case is what my situation is, but to put it more elaborately, it goes like this:

1. Job is assigned to the worker

2. Worker starts the execution of the job

3. Calling worker.stop() before executeFunction returns the GearmanJobResult

4. executeFunction returns successfully.

from the point of view of the API user, worker.stop() is different from worker.shutdown()
in that stop() only issues a shutdown request, not actually shutdown a worker abruptly , i.e.
the working loop should continue and make sure that a full worker cycle be executed,
it should wait for a job to complete fully after its execution even worker.stop() has been called.

Darren (darren-hoo) wrote :

in AbstractGearmanFunction.java GearmanJobResult call()

        if (result.jobSucceeded()) {
            event = new GearmanPacketImpl(GearmanPacketMagic.REQ,
                    GearmanPacketType.WORK_COMPLETE,
                    GearmanPacketImpl.generatePacketData(jobHandle,
                    result.getResults()));
        } else {
            event = new GearmanPacketImpl(GearmanPacketMagic.REQ,
                    GearmanPacketType.WORK_FAIL, jobHandle);

        }
        fireEvent(event);

the event is fired but the packet get no chance to be sent after worker.stop is called.

if this is the expected behavior, then how can I tell a worker to stop without affecting

the jobs currently been executed?

Eric Lambert (elambert) wrote :

Darren:

Thanks for the clarification, makes sense to me.

Eric

Changed in gearman-java:
status: New → Confirmed
assignee: nobody → Eric Lambert (elambert)
milestone: none → 0.05
Eric Lambert (elambert) on 2012-09-25
Changed in gearman-java:
importance: Undecided → Medium
milestone: 0.05 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers