Register same two functions on consecutive connections can crash gearmand

Bug #372074 reported by Robert Stewart
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman
Fix Released
High
Eric Day

Bug Description

I registered two functions (reverse and digest) for the same worker (SimpleWorker) on the same job server using CAN_DO. I closed the connection and reopened a new one to the same server. Upon registering the second function again, gearmand crashed.

When I do the above with a single function, gearmand does not crash. Unregistering the functions with RESET_ABILITIES before closing the first connection does not alter the results (i.e., gearmand still crashes).

Here's the gdb output.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000008
0x0003141d in gearman_server_worker_create (server_con=0xa000, server_function=0x100580, server_worker=0x0) at server_worker.c:57
57 GEARMAN_LIST_DEL(server->free_worker, server_worker, con_)
(gdb) bt
#0 0x0003141d in gearman_server_worker_create (server_con=0xa000, server_function=0x100580, server_worker=0x0) at server_worker.c:57
#1 0x0003139e in gearman_server_worker_add (server_con=0xa000, function_name=0x10040c "digest", function_name_size=6, timeout=0) at server_worker.c:35
#2 0x0002cef5 in gearman_server_run_command (server_con=0xa000, packet=0x1003a0) at server.c:292
#3 0x00030ed7 in _thread_packet_read (con=0xa000) at server_thread.c:306
#4 0x00030c80 in gearman_server_thread_run (thread=0x801e10, ret_ptr=0xbfffe374) at server_thread.c:227
#5 0x00028e44 in gearmand_thread_run (thread=0x801e00) at gearmand_thread.c:206
#6 0x0002ab13 in _con_ready (fd=9, events=2, arg=0x802400) at gearmand_con.c:233
#7 0x00041f31 in event_base_loop (base=0x800e00, flags=0) at event.c:387
#8 0x00026dcc in gearmand_run (gearmand=0x800000) at gearmand.c:196
#9 0x0000214e in main (argc=1, argv=0xbfffed1c) at gearmand.c:205

And here's the {annotated} nc output:

[529] $ nc localhost 4730
{SET_CLIENT_ID SimpleWorker}
workers
10 ::8000:c71c:400 - :
9 ::8000:c71c:400 SimpleWorker :
.
{CAN_DO reverse}
workers
10 ::8000:c71c:400 - :
9 ::8000:c71c:400 SimpleWorker : reverse
.
{CAN_DO digest}
workers
10 ::8000:c71c:400 - :
9 ::8000:c71c:400 SimpleWorker : digest reverse
.
{Closed connection. Opened a new connection. SET_CLIENT_ID SimpleWorker}
workers
9 ::8000:c71c:400 SimpleWorker :
10 ::8000:c71c:400 - :
.
{CAN_DO reverse}
workers
9 ::8000:c71c:400 SimpleWorker : reverse
10 ::8000:c71c:400 - :
.
{CAN_DO digest}
workers

gearmand had crashed at this point.

Related branches

Eric Day (eday)
Changed in gearmand:
assignee: nobody → Eric Day (eday)
importance: Undecided → High
milestone: none → 0.6
Eric Day (eday)
Changed in gearmand:
milestone: 0.6 → 0.7
Revision history for this message
Eric Day (eday) wrote :

I'm not able to reproduce, can you confirm this still exists with the latest release?

Changed in gearmand:
status: New → Incomplete
Revision history for this message
Robert Stewart (robertstewart) wrote :

I can still reproduce it with gearmand 0.9.

I'm using two unit tests in the gearmanij project. They are in the class org.gearman.worker.StandardWorkerTest.java. If I run in the following order - testUnregisterFunction(), testUnregisterAll(), testUnregisterFunction(), testUnregisterAll() - gearmand crashes the second time testUnregisterAll() runs.

...
 INFO Entering main event loop
 INFO Accepted connection from 127.0.0.1:61701
 INFO [ 0] 127.0.0.1:61701 Connected
DEBUG [ 0] 127.0.0.1:61701 Received SET_CLIENT_ID
DEBUG [ 0] 127.0.0.1:61701 Received CAN_DO
DEBUG [ 0] 127.0.0.1:61701 Received TEXT
DEBUG [ 0] 127.0.0.1:61701 Sent TEXT
DEBUG [ 0] 127.0.0.1:61701 Received CANT_DO
DEBUG [ 0] 127.0.0.1:61701 Received TEXT
DEBUG [ 0] 127.0.0.1:61701 Sent TEXT
 INFO [ 0] 127.0.0.1:61701 Disconnected
 INFO Accepted connection from 127.0.0.1:61706
 INFO [ 0] 127.0.0.1:61706 Connected
DEBUG [ 0] 127.0.0.1:61706 Received SET_CLIENT_ID
DEBUG [ 0] 127.0.0.1:61706 Received CAN_DO
DEBUG [ 0] 127.0.0.1:61706 Received CAN_DO
DEBUG [ 0] 127.0.0.1:61706 Received TEXT
DEBUG [ 0] 127.0.0.1:61706 Sent TEXT
DEBUG [ 0] 127.0.0.1:61706 Received TEXT
DEBUG [ 0] 127.0.0.1:61706 Sent TEXT
DEBUG [ 0] 127.0.0.1:61706 Received RESET_ABILITIES
DEBUG [ 0] 127.0.0.1:61706 Received TEXT
DEBUG [ 0] 127.0.0.1:61706 Sent TEXT
DEBUG [ 0] 127.0.0.1:61706 Received TEXT
DEBUG [ 0] 127.0.0.1:61706 Sent TEXT
 INFO [ 0] 127.0.0.1:61706 Disconnected
 INFO Accepted connection from 127.0.0.1:61712
 INFO [ 0] 127.0.0.1:61712 Connected
DEBUG [ 0] 127.0.0.1:61712 Received SET_CLIENT_ID
DEBUG [ 0] 127.0.0.1:61712 Received CAN_DO
DEBUG [ 0] 127.0.0.1:61712 Received TEXT
DEBUG [ 0] 127.0.0.1:61712 Sent TEXT
DEBUG [ 0] 127.0.0.1:61712 Received CANT_DO
DEBUG [ 0] 127.0.0.1:61712 Received TEXT
DEBUG [ 0] 127.0.0.1:61712 Sent TEXT
 INFO [ 0] 127.0.0.1:61712 Disconnected
 INFO Accepted connection from 127.0.0.1:61717
 INFO [ 0] 127.0.0.1:61717 Connected
DEBUG [ 0] 127.0.0.1:61717 Received SET_CLIENT_ID
DEBUG [ 0] 127.0.0.1:61717 Received CAN_DO
DEBUG [ 0] 127.0.0.1:61717 Received CAN_DO
Bus error

The currently checked in file has an @Ignore annotation before that function, which I obviously removed to run the above test.

I noticed that if I run testUnregisterFunction() just once, I can run testUnregisterAll() as many times as I want without a crash.

Also, I have to run testUnregisterFunction, followed by testUnregisterAll, followed by testUnregisterFunction. If I run testUnregisterFunction twice before running testUnregisterAll repeatedly, I don't see the crash.

Revision history for this message
Eric Day (eday) wrote :

Thanks for the extra info Robert! I was able to reproduce this in C and tracked the bug down.

Changed in gearmand:
status: Incomplete → Fix Committed
Eric Day (eday)
Changed in gearmand:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.