Comment 6 for bug 1176324

Revision history for this message
Brian Aker (brianaker) wrote : Re: [Bug 1176324] Re: gearmand still accepts new jobs from time to time, but all worker are stuck too and won't receiver any jobs

Can you test trunk?

Is it possible to get the code you are running (or a stripped down version)?

Sent from my Ti85

On May 7, 2013, at 8:26, Sven Nierlein <email address hidden> wrote:

> I don't use the python library. I only use the C-API. Let me know if i
> can test something.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1176324
>
> Title:
> gearmand still accepts new jobs from time to time, but all worker are
> stuck too and won't receiver any jobs
>
> Status in Gearman Server and Client Libraries:
> New
>
> Bug description:
> .../gearmand --port=4730 --pid-file=/tmp/run/gearmand.pid --daemon
> --job-retries=0 --threads=10 -q libsqlite3 --store-queue-on-shutdown
> --libsqlite3-db=/tmp/gearmand.sqlite --log-file=/tmp/gearmand.log
> --verbose=DEBUG --listen=localhost
>
>
> i am trying the 1.1.6 release version and noticed the gearmand almost stops answering requests after a while.
>
> This is how a normal "status" request via the admin protocol looks in
> the debug logfile:
>
> DEBUG 2013-04-22 14:27:54.377299 [ main ] accept() 340 -> libgearman-server/gearmand.cc:854
> INFO 2013-04-22 14:27:54.377365 [ main ] Accepted connection from 127.0.0.1:49282
> DEBUG 2013-04-22 14:27:54.377437 [ 1 ] Received CON wakeup event -> libgearman-server/gearmand_thread.cc:602
> DEBUG 2013-04-22 14:27:54.377487 [ 1 ] setsockopt() 340 -> libgearman-server/io.cc:809
> DEBUG 2013-04-22 14:27:54.377504 [ 1 ] 127.0.0.1:49282 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
> INFO 2013-04-22 14:27:54.377512 [ 1 ] Gear connection made
> DEBUG 2013-04-22 14:27:54.377521 [ 1 ] 127.0.0.1:49282 Ready POLLIN -> libgearman-server/gearmand_con.cc:118
> DEBUG 2013-04-22 14:27:54.377539 [ 1 ] read 7 bytes -> libgearman-server/io.cc:686
> INFO 2013-04-22 14:27:54.377546 [ 1 ] Gear unpack
> DEBUG 2013-04-22 14:27:54.377552 [ 1 ] Received TEXT 127.0.0.1:40460705 -> libgearman-server/thread.cc:310
> DEBUG 2013-04-22 14:27:54.377574 [ 1 ] 127.0.0.1:49282 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
> DEBUG 2013-04-22 14:27:54.377593 [ proc ] 127.0.0.1:49282 packet command TEXT -> libgearman-server/server.cc:98
> DEBUG 2013-04-22 14:27:54.377626 [ proc ] text command status 1 arguments -> libgearman-server/text.cc:63
> DEBUG 2013-04-22 14:27:54.377693 [ 1 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:607
> DEBUG 2013-04-22 14:27:54.377705 [ 1 ] GEAR length: 139 gearmand_command_t: GEARMAN_COMMAND_TEXT -> libgearman-server/plugins/protocol/gear/protocol.cc:274
> DEBUG 2013-04-22 14:27:54.377774 [ 1 ] send() 139 bytes to peer 127.0.0.1:49282 -> libgearman-server/io.cc:275
> DEBUG 2013-04-22 14:27:54.377789 [ 1 ] Sent TEXT to 127.0.0.1:40460705 -> libgearman-server/thread.cc:357
> DEBUG 2013-04-22 14:27:54.377796 [ 1 ] free() packet's data -> libgearman-server/packet.cc:393
>
> DEBUG 2013-04-22 14:28:22.362232 [ 1 ] 127.0.0.1:49282 Ready POLLIN -> libgearman-server/gearmand_con.cc:118
> INFO 2013-04-22 14:28:22.362306 [ 1 ] Peer connection has called close() 127.0.0.1:49282
> INFO 2013-04-22 14:28:22.362317 [ 1 ] Disconnected 127.0.0.1:49282
> DEBUG 2013-04-22 14:28:22.362479 [ 1 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:607
> INFO 2013-04-22 14:28:22.362490 [ 1 ] Gear connection disconnected
>
>
> However, after a while the only thing happens is:
> DEBUG 2013-04-22 14:33:29.292611 [ main ] accept() 345 -> libgearman-server/gearmand.cc:854
> INFO 2013-04-22 14:33:29.292659 [ main ] Accepted connection from 127.0.0.1:50728
>
> Currently i can only reproduce the problem when setting the loglevel to at least notice. In debug loglevel the gearmand
> runs for hours without problems. In Notice loglevel the problem starts after a few minutes when close_wait connections
> pile up. The logfile only contains the accepted jobs and some errors on shutdown:
>
> ERROR 2013-05-03 08:57:50.000000 [ proc ] GEARMAND_WAKEUP_RUN(Bad file descriptor) -> libgearman-server/gearmand_thread.cc:382
> FATAL 2013-05-03 08:57:50.000000 [ main ] pthread_mutex_lock(Invalid argument) -> libgearman-server/gearmand_con.cc:685
> FATAL 2013-05-03 08:57:50.000000 [ main ] pthread_mutex_lock(Invalid argument) -> libgearman-server/gearmand_con.cc:685
> FATAL 2013-05-03 08:57:50.000000 [ main ] pthread_mutex_lock(Invalid argument) -> libgearman-server/gearmand_con.cc:685
> FATAL 2013-05-03 08:57:50.000000 [ main ] pthread_mutex_lock(Invalid argument) -> libgearman-server/gearmand_con.cc:685
>
> Any hints on how to debug that any further?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/gearmand/+bug/1176324/+subscriptions