Jobs with WS timeout exception

Bug #637908 reported by Alexander Strebkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
play framework
Fix Committed
Undecided
Erwan Loisant
Nominated for 1.1 by Alexander Strebkov

Bug Description

(play 1.1 beta2)

Easy to reproduce, just create an empty app and this job:

package jobs;

import play.jobs.*;
import play.libs.WS;

@Every("1mn")
public class Jobtest extends Job {
    public void doJob() {
        WS.HttpResponse response = WS.url("http://www.google.com").get();
    }
}

Running this results in the following exception every third run of the job (in both dev and prod):

13:08:41,365 ERROR ~ java.lang.RuntimeException: java.util.concurrent.TimeoutException: No response received
13:08:41,366 ERROR ~

@63nof0d7m
Error during job execution (jobs.Jobtest)

Execution exception (In /app/jobs/Jobtest.java around line 11)
RuntimeException occured : java.lang.RuntimeException: java.util.concurrent.TimeoutException: No response received

play.exceptions.JavaExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: No response received
 at play.jobs.Job.call(Job.java:119)
 at Invocation.Job(Play!)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: No response received
 at play.libs.WS$WSRequest.get(WS.java:266)
 at jobs.Jobtest.doJob(Jobtest.java:11)
 at play.jobs.Job.doJobWithResult(Job.java:37)
 at play.jobs.Job.call(Job.java:110)
 ... 1 more
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: No response received
 at com.ning.http.client.providers.NettyResponseFuture.get(NettyResponseFuture.java:112)
 at play.libs.WS$WSRequest.get(WS.java:263)
 ... 4 more
Caused by: java.util.concurrent.TimeoutException: No response received
 at com.ning.http.client.providers.NettyResponseFuture.get(NettyResponseFuture.java:124)
 at com.ning.http.client.providers.NettyResponseFuture.get(NettyResponseFuture.java:110)
 ... 5 more

After some time this exception is replaced with "Too many connections" on every execution and job stop working completely. So it will be a blocker issue for anyone using libs.WS in jobs.

Tags: job ws
Revision history for this message
Erwan Loisant (eloisant) wrote :

Should be working now.

Changed in play:
assignee: nobody → Erwan Loisant (eloisant)
status: New → Fix Committed
Revision history for this message
Alexander Strebkov (alexanderstrebkov) wrote :

Unfortunately with semi-production application where jobs take more time and do more things, I still encounter java.util.concurrent.TimeoutException as well as java.nio.channels.ClosedChannelException.
It is just a bit more random and less repeatable now. =(

Changed in play:
status: Fix Committed → In Progress
Revision history for this message
Erwan Loisant (eloisant) wrote :

I'm actually unable to reproduce it, it's working for me - can you find reliable reproduction steps?

One reason could be the fact that the http-async-client keeps a cache of channels per-uri, so maybe it 2 subsequent queries end up conflicting on one channel if the second iteration kicks before the previous is finished?

http://github.com/AsyncHttpClient/async-http-client/blob/master/src/main/java/com/ning/http/client/providers/NettyAsyncHttpProvider.java#L585

Revision history for this message
Alexander Strebkov (alexanderstrebkov) wrote :

On my macbook very reliable steps are to create an empty application. Add file app/jobs/Jobtest.java with exactly following content:

package jobs;

import play.jobs.*;
import play.libs.WS;
import play.Logger;

@Every("1mn")
public class Jobtest extends Job {
    public void doJob() {
        Logger.info("Request started");
        WS.HttpResponse response = WS.url("http://www.google.com").get();
        Logger.info("Request finished");
    }
}

Start it with "play run", browse to localhost:9000 and wait few minutes. Result is in attachment, because it is a bit long. First request is always successful. Second almost always throws this exception. And exactly the same happens on SunOS hosting with the same application in prod mode.

This is hardly related to second request being started before first one is done - there is a 1 minute delay between those...

Revision history for this message
Erwan Loisant (eloisant) wrote :

Thanks for the test case.

I tried your job but it works for me, on my macbook pro on OSX 10.6 (meaning that I'm using Java 1.6.0). What differences could there be between your system and mine? Anything special on your network, a transparent proxy or something?

[ erwan Erwan-Loisants-MacBook-Pro ~/toto ] play run
~ _ _
~ _ __ | | __ _ _ _| |
~ | '_ \| |/ _' | || |_|
~ | __/|_|\____|\__ (_)
~ |_| |__/
~
~ play! 1.1-unstable-r1056, http://www.playframework.org
~
~ Ctrl+C to stop
~
Listening for transport dt_socket at address: 8000
14:30:01,757 INFO ~ Starting /Users/erwan/toto
14:30:02,936 WARN ~ You're running Play! in DEV mode
14:30:03,030 INFO ~ Listening for HTTP on port 9000 (Waiting a first request to start) ...
14:30:33,586 INFO ~ Application 'toto' is now started !
14:31:33,596 INFO ~ Request started
14:31:33,596 INFO ~ Using Async for web service
14:31:33,846 INFO ~ Request finished
14:32:33,854 INFO ~ Request started
14:32:34,047 INFO ~ Request finished
14:33:34,053 INFO ~ Request started
14:33:34,326 INFO ~ Request finished
14:34:34,331 INFO ~ Request started
14:34:34,649 INFO ~ Request finished
14:35:34,660 INFO ~ Request started
14:35:34,967 INFO ~ Request finished
14:36:34,975 INFO ~ Request started
14:36:35,222 INFO ~ Request finished
14:37:35,226 INFO ~ Request started
14:37:35,503 INFO ~ Request finished
14:38:35,509 INFO ~ Request started
14:38:35,827 INFO ~ Request finished
14:39:35,836 INFO ~ Request started
14:39:36,151 INFO ~ Request finished
14:40:36,156 INFO ~ Request started
14:40:36,399 INFO ~ Request finished
14:41:36,406 INFO ~ Request started
14:41:36,659 INFO ~ Request finished
14:42:36,664 INFO ~ Request started
14:42:37,012 INFO ~ Request finished
14:43:37,053 INFO ~ Request started
14:43:37,273 INFO ~ Request finished
14:44:37,282 INFO ~ Request started
14:44:37,563 INFO ~ Request finished
14:45:37,572 INFO ~ Request started
14:45:37,855 INFO ~ Request finished

Revision history for this message
Alexander Strebkov (alexanderstrebkov) wrote :

That's really strange... I also have OSX 10.6 with following java:
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)

I've no proxies... And also in addition to my notebook I have the same problem on two SunOS servers which are located in completely different places. But I've just tried it on Linux hosting and indeed it worked several more times before it failed:

15:19:47,288 INFO ~ Listening for HTTP on port 9001 ...
15:20:47,232 INFO ~ Request started
15:20:47,233 INFO ~ Using Async for web service
15:20:48,102 INFO ~ Request finished
15:21:48,112 INFO ~ Request started
15:21:48,287 INFO ~ Request finished
15:22:48,287 INFO ~ Request started
15:22:48,660 INFO ~ Request finished
15:23:48,662 INFO ~ Request started
15:23:48,886 INFO ~ Request finished
15:24:48,887 INFO ~ Request started
15:24:49,172 INFO ~ Request finished
15:25:49,173 INFO ~ Request started
15:25:49,318 ERROR ~ java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException
15:25:49,326 ERROR ~
...

Revision history for this message
Alexander Strebkov (alexanderstrebkov) wrote :

On yet another Linux it was running whole night with only couple of usual timeouts. No ClosedChannelExceptions at all. I will try to figure out what difference between servers causes problem, but it is really weird...

Revision history for this message
Alexander Strebkov (alexanderstrebkov) wrote :

That's amazing, I can't repeat it this morning running the same versions of OS, java and test app. Not on my macbook, not on any of hosts I have access to...

Revision history for this message
Alexander Strebkov (alexanderstrebkov) wrote :

Looks like this problem could be solved now. I tried jobtest today with old setup and had ClosedChannelException as usual. After that I updated async-http-client-1.2.0-SNAPSHOT.jar to one built from latest sources (http://github.com/AsyncHttpClient/async-http-client.git) and it worked flawlessly. I tried to swap this library several times and run jobtest - every run with old library I had ClosedChannelException and every run with newer was fine.

Revision history for this message
Erwan Loisant (eloisant) wrote :

Cool, I'll update the jar.

Changed in play:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.