rabbit-fence fails to start on centos7 by systemd

Bug #1633715 reported by Wei Hui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Wei Hui
Nominated for Ocata by Oleksiy Molchanov
Mitaka
Won't Fix
High
Oleksiy Molchanov
Newton
Won't Fix
High
Oleksiy Molchanov

Bug Description

Detailed bug description:
systemctl start rabbit-fence fails to brige up rabbit-fence.

 [root@node-39 ~]# systemctl status rabbit-fence
● rabbit-fence.service - SYSV: Starts/Stops RabbitMQ fence daemon
   Loaded: loaded (/etc/rc.d/init.d/rabbit-fence)
   Active: failed (Result: exit-code) since Sat 2016-10-15 15:19:51 CST; 3h 25min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 23654 ExecStop=/etc/rc.d/init.d/rabbit-fence stop (code=exited, status=0/SUCCESS)
  Process: 23663 ExecStart=/etc/rc.d/init.d/rabbit-fence start (code=exited, status=1/FAILURE)

Oct 15 15:14:51 node-39.fuelnode systemd[1]: Starting SYSV: Starts/Stops RabbitMQ fence daemon...
Oct 15 15:14:51 node-39.fuelnode python[21213]: rabbit-fence 2016-10-15 15:14:51,724 INFO Caught SIGTERM, terminating...
Oct 15 15:14:51 node-39.fuelnode python[21213]: rabbit-fence 2016-10-15 15:14:51,928 ERROR A generic exception caught!
                                                Traceback (most recent call last):
                                                  File "/usr/bin/rabbit-fence.py", line 164, in <module>...
Oct 15 15:14:52 node-39.fuelnode python[23663]: rabbit-fence 2016-10-15 15:14:52,027 INFO Starting rabbit fence script main loop
Oct 15 15:19:51 node-39.fuelnode systemd[1]: rabbit-fence.service start operation timed out. Terminating.
Oct 15 15:19:51 node-39.fuelnode python[23663]: rabbit-fence 2016-10-15 15:19:51,702 INFO Caught SIGTERM, terminating...
Oct 15 15:19:51 node-39.fuelnode systemd[1]: rabbit-fence.service: control process exited, code=exited status=1
Oct 15 15:19:51 node-39.fuelnode systemd[1]: Failed to start SYSV: Starts/Stops RabbitMQ fence daemon.
Oct 15 15:19:51 node-39.fuelnode systemd[1]: Unit rabbit-fence.service entered failed state.
Oct 15 15:19:51 node-39.fuelnode systemd[1]: rabbit-fence.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

The fuel-rabbit-fence-9.0.0-16.noarch.rpm package includes two file, /usr/bin/rabbit-fence.py and /etc/init.d/rabbit-fenc, systemd.generator will generate rabbit-fence service unit file according to /etc/init.d/rabbit-fence, the generated service unit file is
#########################################################################
[root@node-39 ~]# cat /run/systemd/generator.late/rabbit-fence.service
# Automatically generated by systemd-sysv-generator

[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/rc.d/init.d/rabbit-fence
Description=SYSV: Starts/Stops RabbitMQ fence daemon
Before=runlevel2.target runlevel3.target runlevel4.target runlevel5.target
After=network-online.target network.service sysfsutils.service

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
ExecStart=/etc/rc.d/init.d/rabbit-fence start
ExecStop=/etc/rc.d/init.d/rabbit-fence stop
#########################################################################
Notice the service Type=forking, which indicates rabbit-fence fork a child process and parent process exit.
But when rabbit-fence.py creates DaemonContext, it does not give arg detach_process=True, and systemd's pid is 1, so
the python daemon lib does not fork.
https://github.com/openstack/fuel-library/blob/master/files/rabbit-fence/rabbit-fence.py#L188

Steps to reproduce:
systemctl restart rabbit-fence

Expected results:
rabbit-fence begin to run

Actual result:
systemd fail to start rabbit-fence

Reproducibility:
100%

Workaround:
When build fuel-rabbit-fence rpm package, use debian/fuel-rabbit-fence.service
or
When create DaemonContext give arg detach_process=True

This problem is not unique to centos7, when ubuntu switchs to systemd, may face the same problem.

In systemd service unit file, Type can be simple, forking, notify....
simple: systemd just start the process.
forking: systemd start the process and wait the process fork a child and exit.
notify: systemd start the process and wait for the procces notify itself that it is ready to work.
forking is much robust then simple, although we can not guarantee the child process will run normally after the fork, at least it has a better percentage of success. While notify is best, but it has to change application code and
use systemd lib code, which is less portable.

So the second solution is better.

Impact:
 Fuel deploy failed

Description of the environment:
 Operation system: 7.1.1503
 Versions of components: fuel 9.0
 Reference architecture: x86_64

Tags: area-library
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/387930

Changed in fuel:
assignee: nobody → Wei Hui (huiweics)
status: New → In Progress
Changed in fuel:
importance: Undecided → High
milestone: none → 10.0
Changed in fuel:
assignee: Wei Hui (huiweics) → Maksim Malchuk (mmalchuk)
tags: added: area-library
Changed in fuel:
assignee: Maksim Malchuk (mmalchuk) → Wei Hui (huiweics)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/387930
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=45ab46f288f5bdbe0ea7febc6e5157c2e81e0868
Submitter: Jenkins
Branch: master

commit 45ab46f288f5bdbe0ea7febc6e5157c2e81e0868
Author: Wei Hui <email address hidden>
Date: Tue Oct 18 18:45:07 2016 +0800

    Make rabbit-fence more robust and portable

    If Type=forking in systemd's service unit file, systemd requires started
    application to fork a child process and exit. This change makes rabbit-fence
    work under this situation. It does not has any negative effect on ubuntu's
    /sbin/init, execept fork one more time. In a word, this change makes rabbit-fence
    more robust and portable in a very limited cost.

    Change-Id: Ia0dfc204ba6879bd4252585a719c8ad9afac7daa
    Closes-bug: #1633715

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/426193

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/426194

Changed in fuel:
milestone: 10.0 → 11.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 11.0.0.0rc1

This issue was fixed in the openstack/fuel-library 11.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/mitaka)

Change abandoned by Oleksiy Molchanov (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/426194

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/newton)

Change abandoned by Andreas Jaeger (<email address hidden>) on branch: stable/newton
Review: https://review.opendev.org/426193
Reason: This repo is retired now, no further work will get merged.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.