upstart service unreliable after introducion of pipe to logger

Bug #1385346 reported by Michael Kerrin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Gregory Haynes

Bug Description

upstart scripts for the openstack services now pipe the output to logger for example neutron-server the start script is:

The | introduces a fork which can in some circumtances leave a process behind after shutting down. For example after two runs of o-r-c I see

Oct 23 12:24:59 overcloud-controller1-vgsj3lyoqjix os-collect-config: + os-svc-enable -n neutron-server
Oct 23 12:24:59 overcloud-controller1-vgsj3lyoqjix os-collect-config: + os-svc-restart -n neutron-server
Oct 23 12:24:59 overcloud-controller1-vgsj3lyoqjix os-collect-config: stop: Unknown instance:
Oct 23 12:25:00 overcloud-controller1-vgsj3lyoqjix os-collect-config: neutron-server start/running, process 8126

And then

Oct 23 12:27:07 overcloud-controller1-vgsj3lyoqjix os-collect-config: + os-svc-enable -n neutron-server
Oct 23 12:27:07 overcloud-controller1-vgsj3lyoqjix os-collect-config: + os-svc-restart -n neutron-server
Oct 23 12:27:08 overcloud-controller1-vgsj3lyoqjix os-collect-config: neutron-server stop/waiting
Oct 23 12:27:09 overcloud-controller1-vgsj3lyoqjix os-collect-config: neutron-server start/running, process 12301

But if I look at the system I see

root@overcloud-controller1-vgsj3lyoqjix:/var/log/neutron# ps auxww | grep neutron-server
neutron 8127 62.0 0.9 232516 74088 ? R 12:24 38:00 /opt/stack/venvs/openstack/bin/python /opt/stack/venvs/openstack/bin/neutron-server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/rootwrap.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/server.ini
root 11370 0.0 0.0 4312 616 ? Ss 13:25 0:00 /bin/sh -e -c exec start-stop-daemon --start -c neutron --exec /opt/stack/venvs/openstack/bin/neutron-server – --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/rootwrap.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/server.ini 2>&1 | logger -t neutron-server /bin/sh
neutron 11371 14.5 1.5 286736 128432 ? S 13:25 0:04 /opt/stack/venvs/openstack/bin/python /opt/stack/venvs/openstack/bin/neutron-server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/rootwrap.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/server.ini
root 11372 0.0 0.0 4212 340 ? S 13:25 0:00 logger -t neutron-server
root 11759 0.0 0.0 10464 864 pts/0 S+ 13:26 0:00 grep neutron-server

Now the process 8127 holds the neutron api server port 9696 so the second instance of neutron-server (11371) can bind to it and so goes into a infinite loop trying to start up.

The pipe was introduced in the review: https://review.openstack.org/#/c/104619/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/130830

Changed in tripleo:
assignee: nobody → Michael Kerrin (michael-kerrin-w)
status: New → In Progress
Revision history for this message
Ben Nemec (bnemec) wrote :

This basically means it's impossible to restart a service on an Upstart-based distro right? Seems like a serious problem to me.

Changed in tripleo:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/130830
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=615c6bb95b818551934e652e70cd7833f965ccd6
Submitter: Jenkins
Branch: master

commit 615c6bb95b818551934e652e70cd7833f965ccd6
Author: Michael Kerrin <email address hidden>
Date: Fri Oct 24 16:42:12 2014 +0100

    We need to tell upstart to expect the logger fork

    Otherwise upstart doesn't get the correct process id for the service
    and in some causes can fail to kill the process during a servcie stop

    Change-Id: I138dd2206ed1e511fbabaf43bd3cb1d163c76db4
    Closes-bug: #1385346

Changed in tripleo:
status: In Progress → Fix Committed
Revision history for this message
Gregory Haynes (greghaynes) wrote :

So there is an issue here that upstart is not tracking the correct pid, but I dont believe it is a result of the piping we are doing. Upstart should be tracking the PID of the first process which will not be the logger. I think what is more likely is we are running into https://bugs.launchpad.net/upstart/+bug/406397 which is not a fun one to fix...

Revision history for this message
Gregory Haynes (greghaynes) wrote :

ok, I was a bit off, what is actually going on is upstart is killing the /bin/sh process and neutron-server isnt exiting as a result. Expect fork does fix this in a much cleaner fasion.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

This wasn't a complete solution.. Greg has a cleaner way.

Changed in tripleo:
status: Fix Committed → In Progress
assignee: Michael Kerrin (michael-kerrin-w) → Gregory Haynes (greghaynes)
Revision history for this message
Marios Andreou (marios-b) wrote :

Don't know why this hasn't shown up here:

Log directly to syslog from os services https://review.openstack.org/#/c/138844/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/138844
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=5fc2040aa5e8bb146b9b220ded376ca49a1995f9
Submitter: Jenkins
Branch: master

commit 5fc2040aa5e8bb146b9b220ded376ca49a1995f9
Author: Gregory Haynes <email address hidden>
Date: Wed Dec 3 12:13:39 2014 -0800

    Log directly to syslog from os services

    We currently have a horrible hack for upstart where we run our services
    with stdout and stderr piped into logger. There are lots of issues with
    this, such as upstart has a hard time knowing which PID to track.
    Conveniently, python logging knows how to syslog, so why not just use
    that!

    This changes the default behaviour of os-svc-daemon to create a python
    logging.conf so services log directly to syslog.

    This also changes the default upstart respawn limiting to be 2 fails in
    10 secs. This is becauce making upstart monitor the actual service
    daemon was causing some services to respawn loop due to time it took
    them to fail.

    Closes-Bug: #1385346

    Change-Id: Ibd491881d79160ad1457526d51512f5fc1cf9ea5

Changed in tripleo:
status: In Progress → Fix Committed
Ben Nemec (bnemec)
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.