Worker doesn't flush entire payload back to gearmand in non-block mode

Bug #975591 reported by Keyur
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman
Fix Released
Medium
Brian Aker

Bug Description

In non-block mode:
If the worker is attempting to return a very large data packet back to gearmand (which will forward it back to the client), then that operation sometimes fails.

gearman_job_send_complete calls job.cc's _job_send() which calls connection.cc's send() which then calls flush(). The packet is sent in at least 2 chunks if it is larger than 8192 bytes. The first chunk is always 8192, and then the remaining packet is attempted to be flushed. Say we wanted to send 100,000 bytes.

Here's how it looks like in strace:

sendto(6, "\x00\x52\x45\x51\x00\x00\x00\x0d\x00\x01\x86\xc2\x48\x3a\x73\x68\x6f\x70\x73\x74\x61\x74\x73\x2e\x76\x6d\x2e\x6e\x79\x34\x64\x65\x76\x2e\x65\x74\x73\x79\x2e"..., 8192, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 8192 <0.000171>
sendto(6, "aaaaaaaaaaaaaaaaaaa"..., 91854, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000007>

The EAGAIN triggers a GEARMAN_IO_WAIT return, that then bubbles up the stack, ending up with gearman_job_send_fail_fin() being called.

Now gearman_job_send_fail_fin will again attempt to send the failed packet of size 91854, and this will almost always fails with GEARMAN_IO_WAIT as well.

gearmand not getting the entire packet promised in the packet header decides the job has failed and communicates the same to the client.

The attached patch fixes this by retrying _job_send when IO_WAIT.

Revision history for this message
Keyur (keyurdg) wrote :

Patch attached. Should import with patch -p2

Revision history for this message
Brian Aker (brianaker) wrote :

Test case:

+static test_return_t regression_975591_TEST(void *object)
+{
+ gearman_client_st *client= (gearman_client_st *)object;
+ test_true(client);
+
+ gearman_function_t dreaming_fn= gearman_function_create(echo_or_react_worker_v2);
+ worker_handle_st* worker_handle= test_worker_start(libtest::default_port(), NULL,
+ __func__,
+ dreaming_fn, NULL,
+ gearman_worker_options_t(),
+ 0);
+ int payload_size[] = { 100, 1000, 10000, 1000000, 1000000, 0 };
+ int *ptr= payload_size;
+ libtest::vchar_t payload;
+ while(*(ptr++))
+ {
+ payload.reserve(*ptr);
+ for (size_t x= payload.size(); x < *ptr; x++)
+ {
+ payload.push_back(rand());
+ }
+
+ size_t result_length;
+ gearman_return_t rc;
+ char *job_result= (char*)gearman_client_do(client, __func__,
+ NULL,
+ &payload[0], payload.size(),
+ &result_length, &rc);
+ test_compare(GEARMAN_SUCCESS, rc);
+ test_compare(payload.size(), result_length);
+ test_memcmp(&payload[0], job_result, result_length);
+ }
+
+ delete worker_handle;
+
+ return TEST_SUCCESS;
+}
+

Changed in gearmand:
milestone: none → 0.31
assignee: nobody → Brian Aker (brianaker)
importance: Undecided → Medium
status: New → In Progress
Brian Aker (brianaker)
Changed in gearmand:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.