Worker crash with long function name
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gearman |
Fix Released
|
Medium
|
Brian Aker |
Bug Description
0.23 appears to have introduced a bug where a worker will crash upon receiving a job if the function name is too long (somewhere in the mid-50 characters), which is still present in 0.24. It may take several jobs to observe the crash. The following are all on Ubuntu Natty using boost 1.47.0.
Client side is just running this in a loop:
gearman -f 55_char_
Worker side runs:
ubuntu@
ubuntu@
0.24
ubuntu@
payloadSegmentation fault
ubuntu@
payloadpayloadp
ubuntu@
payloadpayloadS
ubuntu@
payloadpayloadS
ubuntu@
payloadpayloadS
etc.
Here's the end of the worker strace on 0.24:
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
recvfrom(3, "\0RES\
sendto(3, "\0REQ\
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
write(1, "\0\0\0\
sendto(3, "\0REQ\
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
sendto(3, "\0REQ\
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
write(1, "\0\0\0\
sendto(3, "\0REQ\
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
sendto(3, "\0REQ\
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
write(1, "\0\0\0\
sendto(3, "\0REQ\
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
sendto(3, "\0REQ\
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(3, "\0REQ\
recvfrom(3, "\0RES\
write(1, "\0\0\0\
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault
Slightly shorter function name lengths dramatically lower the failure rate. I was able to get through 30K jobs with no issues with a 52-character name.
The same problem exists in 0.23, while I can run 10s of thousands of jobs with much longer job names on 0.22.
Changed in gearmand: | |
assignee: | nobody → Brian Aker (brianaker) |
importance: | Undecided → Medium |
Changed in gearmand: | |
status: | New → In Progress |
Changed in gearmand: | |
status: | Fix Committed → Fix Released |
I've not been able to repeat this. I am pushing up a test case, if you can modify it and demonstrate this that would be useful.