Segmentation fault in libgearman runtask
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gearman |
New
|
Medium
|
Unassigned |
Bug Description
Segfault issue seemingly at random. I cannot reproduce the issue to make it crash.
I've been getting Segmentation faults in my Apache error logs.
[notice] child pid 26412 exit signal Segmentation fault (11)
It's been running in production for many months, and the segfault just started showing its head randomly recently. Gearmand is running with a mysql queue. Restarting gearmand seems to make it go away for a week or so, but then comes back. I also have a cron job that acts as a keep alive (submitting + running a job) so that gearman's mysql connection doesn't timeout.
I have been able to load the debug symbols for gearman and review:
(gdb) bt
#0 0x00002aaab1fab1f8 in _client_run_tasks (client=
exit_task=0x0) at libgearman/
#1 0x00002aaab1fad6cc in gearman_
at libgearman/
#2 0x00002aaab1d89e01 in zif_gearman_
out>, return_
this_ptr=<value optimized out>,
return_
/var/tmp/
#3 0x00002aaab09cfe4a in zend_parse_
(num_args=
/usr/src/
#4 0x00002b7ebc855350 in ?? ()
#5 0x010000000000000f in ?? ()
#6 0x0000000000000000 in ?? ()
(gdb) frame 0
#0 0x00002aaab1fab1f8 in _client_run_tasks (client=
exit_task=0x0) at libgearman/
1412 for (client-
client-
(gdb)
It looks like it is crashing in libgearman/
Code:
//$jobdata is validated prior
$gmclient= new GearmanClient();
//cast port as int
$gmclient-
$gmclient-
//assign a unique id for the job (limit the length to prevent db errors)
$uniqueid = substr(
//add the job to the background
$job_handle = $gmclient-
//had some instances where gearmand was slow to respond, loops below try a few times before giving up.
//ping gearman to make sure it is working before sending job
//keep trying for 2 seconds (4 times per second) until success or failure
$pingCount = 1;
while (@$gmclient-
log("couldn't ping gearman server");
usleep(250000); //sleep for 0.25 seconds before trying again
$pingCount++;
}
//queue the job
//keep trying for 2 seconds (4 times per second) until success or failure
$runTaskCount = 1;
//supress gearman fatal errors so we can catch them
while (@!$gmclient-
log('Error:: Could not run task on runTaskCount=
usleep(
$runTaskCo
}
if (@$gmclient-
{
log("job didn't queue");
}
Seems like this is probably something corrupting the pointers, but it's hard to say without a test.