gearmand 0.20 does daemonize with --daemon

Bug #771486 reported by piavlo
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Gearman
Fix Released
Medium
Brian Aker

Bug Description

[root@dev1 ~]# /usr/local/sbin/gearmand -u gearmand --threads=3 --daemon -vvv --job-retries=1 --pid-file=/var/run/gearman/gearmand.pid --port=4730 --queue-type=libdrizzle --libdrizzle-host=master100.internal --libdrizzle-port=3306 --libdrizzle-user=gearman --libdrizzle-password=secret --libdrizzle-db=services --libdrizzle-table=gearman_queue_stage --libdrizzle-mysql
ERROR [ main ] Failed to listen on :::4730

Now it is stuck in foreground
Looking at the processes there is a parent and child processes - the child has all the sockets open.
When I hit ctrl+C - the parent process exits and no longer holds the foreground of my tty.
The child is functioning ok (both before and after ctrl+C)

Stopping the child works only after several tries with
-------------
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
[root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
ERROR [ 3 ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:358
[root@dev1 ~]#
-------------

The following always works on the first try
-------------
[root@dev1 ~]# socat tcp-connect:localhost:4730 -
shutdown
OK
[root@dev1 ~]#
-------------

The GEARMAND_WAKEUP_SHUTDOWN message is printed each time shutdown succeeds.

Looking at stdin/stder/stdout - it indeed seems like gearmand is not properly daemonized

With 0.14
---------
[root@jobs1a ~]# lsof -p $(pgrep gearmand) | awk '$4 ~ /^(0|1|2)u$/'
gearmand 6877 gearmand 0u CHR 1,3 935 /dev/null
gearmand 6877 gearmand 1u CHR 1,3 935 /dev/null
gearmand 6877 gearmand 2u CHR 1,3 935 /dev/null
[root@jobs1a ~]#
---------

While with 0.20
---------
[root@dev1 ~]# lsof -p $(pgrep gearmand) | awk '$4 ~ /^(0|1|2)u$/'
gearmand 31329 gearmand 0u IPv4 551721 TCP dev1.internal:58643->master100.internal:mysql (ESTABLISHED)
gearmand 31329 gearmand 1u CHR 136,2 4 /dev/pts/2
gearmand 31329 gearmand 2u CHR 136,2 4 /dev/pts/2
[root@dev1 ~]#
---------

Thanks
Alex

Revision history for this message
timuckun (timuckun) wrote :

I too have observed this behavior. I got around it by using start-stop-daemon on ubuntu and run it as root.

Revision history for this message
piavlo (piavka) wrote :

Well I also was planning to use http://software.clapper.org/daemonize/ (in case the problem is not going to be fixed anytime soon)

BTW do you experience similar behaviour with "/usr/local/bin/gearadmin --shutdown" ?

Revision history for this message
Michael Alan Dorman (mdorman) wrote :

The problem would appear to be the new startup code introduced in version 0.17.

If running as root, with the -u flag to change uid, the parent process ends up waiting on a signal from the child process that the child process is no longer allowed to deliver, because it's now under a different uid.

Revision history for this message
Brian Aker (brianaker) wrote : Re: [Bug 771486] Re: gearmand 0.20 does daemonize with --daemon
Download full text (3.4 KiB)

Thanks, I'll take a look at this today.

On May 25, 2011, at 11:12 AM, Michael Alan Dorman wrote:

> The problem would appear to be the new startup code introduced in
> version 0.17.
>
> If running as root, with the -u flag to change uid, the parent process
> ends up waiting on a signal from the child process that the child
> process is no longer allowed to deliver, because it's now under a
> different uid.
>
> --
> You received this bug notification because you are subscribed to
> Gearman.
> https://bugs.launchpad.net/bugs/771486
>
> Title:
> gearmand 0.20 does daemonize with --daemon
>
> Status in Gearman Server and Client Libraries:
> New
>
> Bug description:
> [root@dev1 ~]# /usr/local/sbin/gearmand -u gearmand --threads=3 --daemon -vvv --job-retries=1 --pid-file=/var/run/gearman/gearmand.pid --port=4730 --queue-type=libdrizzle --libdrizzle-host=master100.internal --libdrizzle-port=3306 --libdrizzle-user=gearman --libdrizzle-password=secret --libdrizzle-db=services --libdrizzle-table=gearman_queue_stage --libdrizzle-mysql
> ERROR [ main ] Failed to listen on :::4730
>
> Now it is stuck in foreground
> Looking at the processes there is a parent and child processes - the child has all the sockets open.
> When I hit ctrl+C - the parent process exits and no longer holds the foreground of my tty.
> The child is functioning ok (both before and after ctrl+C)
>
> Stopping the child works only after several tries with
> -------------
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> ERROR [ 3 ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:358
> [root@dev1 ~]#
> -------------
>
> The following always works on the first try
> -------------
> [root@dev1 ~]# socat tcp-connect:localhost:4730 -
> shutdown
> OK
> [root@dev1 ~]#
> -------------
>
> The GEARMAND_WAKEUP_SHUTDOWN message is printed each time shutdown
> succeeds.
>
> Looking at stdin/stder/stdout - it indeed seems like gearmand is not
> properly daemonized
>
> With 0.14
> ---------
> [root@jobs1a ~]# lsof -p $(pgrep gearmand) | awk '$4 ~ /^(0|1|2)u$/'
> gearmand 6877 gearmand 0u CHR 1,3 935 /dev/null
> gearmand 6877 gearmand 1u CHR 1,3 935 /dev/null
> gearmand 6877 gearmand 2u CHR 1,3 935 /dev/null
> [root@jobs1a ~]#
> ---------
>
> While with 0.20
> ---------
> [root@dev1 ~]# lsof -p $(pgrep gearmand) | awk '$4 ~ /^(0|1|2)u$/'
> gearmand 31329 gearman...

Read more...

Brian Aker (brianaker)
Changed in gearmand:
assignee: nobody → Brian Aker (brianaker)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Michael Alan Dorman (mdorman) wrote :

OK, here's a patch that works, and may even be correct. It still includes some debugging cruft---in fact, the fix is a single line, everything else is debugging---you might want to excise, or you might want to enable to see the issue.

What gearmand was doing was forking, with the parent waiting on the child, but the child never exited, so the parent waited forever. My guess is that the intent is that the child will signal the parent when it's all set up, so the parent can then exit, but since the child has changed uids, it's no longer allowed to send USR1 to the parent, so that doesn't work.

I looked at the original code in drizzle and couldn't figure out how it worked there, either. ;)

Anyway, to solve the problem at hand, I simply had the child fork again just before it would have tried to send the signal to its parent, but I have it call the ::daemonize routine in the don't-wait mode so that the first child then terminates, which the first parent notices, so it can finish up. The second child continues on its merry way. This resolves the immediate problem of --daemon seeming not to daemonize.

Thinking about it the better fix might be to have my added call be in "wait mode", because the USR1 *could* be delivered because the common uid, and perhaps even move the daemon_is_ready call further down, so that we might actually be able to detect more failure modes (can't create PID file, etc). I'll try that in the morning.

Revision history for this message
Michael Alan Dorman (mdorman) wrote :

One last thought (again, I'll poke at this in the morning): the current drizzle code might be better structured for what I proposed---since it closes low-number FDs during daemonize() instead of deferring that to daemon_is_ready().

I think if you used the current drizzle code, I think you could move the daemon_is_ready call to just before gearmand_run, meaning you could catch just about all possible findable-at-startup errors--if the grandchild has a failure, it would be caught by the child's WIFEXITED tests, which would then cascade upward to the parent process, which would also report that status.

Revision history for this message
Michael Alan Dorman (mdorman) wrote :

OK, as promised, I tried going back to the original drizzle daemonize code, and I think this represents the best solution.

We now fork once and wait on the child, then the child changes uids and some other stuff, then forks itself, while the grandchild finished up initialization, and if successful, signals the child to exit successfully, which then allows the parent to exit successfully. If the child or grandchild are unable to do their thing successfully, they will exit with a status code that will be picked up by the parent.

I think this does everything that people want.

Brian Aker (brianaker)
Changed in gearmand:
status: In Progress → Fix Committed
Revision history for this message
piavlo (piavka) wrote :

Any chance you can put a pre-relase tar with the current patches and ready to use configure - since I'm failing to generate a configure with ./config/autorun.sh as can be seen in the 759152 bug.

Thanks

Revision history for this message
Brian Aker (brianaker) wrote :
Download full text (3.5 KiB)

Hi!

We do custom binaries for customers, that being said we are going to go through the bug list today and see where we are it.

If you pull lp:gearmand you can get what is likely to be the next release. Feedback is great.

Cheers,
 -Brian

On Jun 8, 2011, at 12:49 AM, piavlo wrote:

> Any chance you can put a pre-relase tar with the current patches and
> ready to use configure - since I'm failing to generate a configure with
> ./config/autorun.sh as can be seen in the 759152 bug.
>
> Thanks
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/771486
>
> Title:
> gearmand 0.20 does daemonize with --daemon
>
> Status in Gearman Server and Client Libraries:
> Fix Committed
>
> Bug description:
> [root@dev1 ~]# /usr/local/sbin/gearmand -u gearmand --threads=3 --daemon -vvv --job-retries=1 --pid-file=/var/run/gearman/gearmand.pid --port=4730 --queue-type=libdrizzle --libdrizzle-host=master100.internal --libdrizzle-port=3306 --libdrizzle-user=gearman --libdrizzle-password=secret --libdrizzle-db=services --libdrizzle-table=gearman_queue_stage --libdrizzle-mysql
> ERROR [ main ] Failed to listen on :::4730
>
> Now it is stuck in foreground
> Looking at the processes there is a parent and child processes - the child has all the sockets open.
> When I hit ctrl+C - the parent process exits and no longer holds the foreground of my tty.
> The child is functioning ok (both before and after ctrl+C)
>
> Stopping the child works only after several tries with
> -------------
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> [root@dev1 ~]# /usr/local/bin/gearadmin --shutdown
> ERROR [ 3 ] GEARMAND_WAKEUP_SHUTDOWN(Bad file descriptor) -> libgearman-server/gearmand.cc:358
> [root@dev1 ~]#
> -------------
>
> The following always works on the first try
> -------------
> [root@dev1 ~]# socat tcp-connect:localhost:4730 -
> shutdown
> OK
> [root@dev1 ~]#
> -------------
>
> The GEARMAND_WAKEUP_SHUTDOWN message is printed each time shutdown
> succeeds.
>
> Looking at stdin/stder/stdout - it indeed seems like gearmand is not
> properly daemonized
>
> With 0.14
> ---------
> [root@jobs1a ~]# lsof -p $(pgrep gearmand) | awk '$4 ~ /^(0|1|2)u$/'
> gearmand 6877 gearmand 0u CHR 1,3 935 /dev/null
> gearmand 6877 gearmand 1u CHR 1,3 935 /dev/null
> gearmand 6877 gearmand 2u CHR 1,3 935 /dev/null
> [root@jobs1a ~]#
> ---------
>
> While with 0.20
> ---------
> [roo...

Read more...

Revision history for this message
piavlo (piavka) wrote :

Pulling code will not help me since ./config/autorun.sh does not work for me on CentOS5
Anything I can do to help fix the ./config/autorun.sh problem? I can't fix it on my own.
Should I open a separate bug for this?

Thanks

Revision history for this message
piavlo (piavka) wrote :

Actually figured out the problem with ./config/autorun.sh - compiled latest version of autoconf.
AFAIU at least 2.60 is needed - maybe you should add this to package requirements.

Thanks

Brian Aker (brianaker)
Changed in gearmand:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.