OCF scripts should not rely on a 'kill' command exit code, but check proc fs instead

Bug #1425579 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Alex Schultz
6.1.x
Won't Fix
High
MOS Maintenance
7.0.x
Won't Fix
High
MOS Maintenance

Bug Description

The common case for action stop in OCF scripts for Pacemaker RA is:
1) check, by exit code, if kill (SIGTERM) succeeds
2) issue kill -9, if it wasn't.

But 'kill' always returns 0, if the given PID matched the running process and never checks the process actual state. This can be easily checked, for example:
p=`pidof cron`; sudo kill -STOP $p; sudo kill $p; echo $?; ls /proc/$p; ps -p $p
As a result, a cron will be kept running because it cannot process signals, but kill will report an exit code 0.

That is an issue as it drastically increases the number of undesired SIGKILL cases for stop actions and RA should instead tend to perform them gracefully, if possible. Besides that, for heavy loaded system, it is possible that some process cannot process SIGTERM an instant, ending up being shot in the head with SIGKILL.

The solution is to check the results of kill command against a proc fs instead of its exit code and introduce retries for kill as well. SIGKILL should be issued only in cases then there are no more retries left for graceful termination.

Changed in fuel:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: New → Triaged
importance: Undecided → Medium
milestone: none → 6.1
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/159188

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/160399

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/159188
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=7db84c490181b48a37f1bb53773facb77458daeb
Submitter: Jenkins
Branch: master

commit 7db84c490181b48a37f1bb53773facb77458daeb
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Feb 25 17:53:22 2015 +0100

    Adjust stop/monitor and fix kill for ns_haproxy

    W/o this patch, haproxy daemon under load never can
    be stopped gracefully and it also may exceed given stop/monitor
    timeouts ending up being restarted by pacemaker.

    The solution is:
    * increase haproxy resource monitor/stop timeout and interval
    * fix stop action for OCF to not rely on kill command exit codes
    * wrap kill in retries and issue a SIGKILL only if there are no more
      retries left for gracefull termination
    * add debug parameter for haproxy RA OCF script and provide more
      logging as well
    * set debug for haproxy RA, if global debug is enabled

    Closes-bug: #1424959
    Related-bug: #1425579

    Change-Id: I7ab6dc2341821c3b82ef3d3ac63b64a5a9958fa9
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/6.0)

Reviewed: https://review.openstack.org/160399
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=2eea7abdf6e5bbff2c6369a10ab2dba74eda31c7
Submitter: Jenkins
Branch: stable/6.0

commit 2eea7abdf6e5bbff2c6369a10ab2dba74eda31c7
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Feb 25 17:53:22 2015 +0100

    Adjust stop/monitor and fix kill for ns_haproxy

    W/o this patch, haproxy daemon under load never can
    be stopped gracefully and it also may exceed given stop/monitor
    timeouts ending up being restarted by pacemaker.

    The solution is:
    * increase haproxy resource monitor/stop timeout and interval
    * fix stop action for OCF to not rely on kill command exit codes
    * wrap kill in retries and issue a SIGKILL only if there are no more
      retries left for gracefull termination
    * add debug parameter for haproxy RA OCF script and provide more
      logging as well
    * set debug for haproxy RA, if global debug is enabled

    Closes-bug: #1424959
    Related-bug: #1425579

    Change-Id: I7ab6dc2341821c3b82ef3d3ac63b64a5a9958fa9
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Confirmed
assignee: Bogdan Dobrelya (bogdando) → Fuel Library Team (fuel-library)
Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Ilyin (idv1985)
Changed in fuel:
status: Confirmed → Won't Fix
status: Won't Fix → Confirmed
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

There are two different statuses for 6.1, please set correct one.

Changed in fuel:
status: Confirmed → Won't Fix
no longer affects: fuel/6.1.x
tags: added: qa-agree-7.0
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Moved to 8.0 with approval from devs and QA.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This should be converted into blueprint as it is a major refactoring - adding feature tag

tags: added: feature
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/216046

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/216046
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=3093aa28509bac8005ab50e3f3126a4f35629126
Submitter: Jenkins
Branch: master

commit 3093aa28509bac8005ab50e3f3126a4f35629126
Author: Vladimir Kuklin <email address hidden>
Date: Sun Aug 23 14:12:13 2015 -0500

    Revert haproxy restart primitive back to reload

    This commit switches back haproxy configuration reload
    to use reload command of OCF script. Otherwise, for Ubuntu
    since kernel 3.9 haproxy due to some race condition in
    OCF script may start two times with different configs
    as it uses REUSEPORT socket option. By using reload
    we ensure that there is always one process on the listening
    socket which has the latest config.

    This fix is may be partial as we may also require additional work
    on OCF script to introduce additional checks on pids and pidfiles.
    It can also be tracked as a partial duplicate of the bug related
    to better handling of processes manipulations in OCF scripts.

    Change-Id: Iadad82d1733a3c22c470e1ddb761900304ee9753
    Partial-bug: #1487900
    Related-bug: #1425579
    Related-bug: #1401850

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/228014

Changed in fuel:
status: Triaged → In Progress
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
Changed in fuel:
milestone: 6.1 → 8.0
Changed in fuel:
assignee: Alex Schultz (alex-schultz) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)
Changed in fuel:
assignee: Alex Schultz (alex-schultz) → Sergii Golovatiuk (sgolovatiuk)
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/228014
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7c210ee68db112bbd16066a54bfdf462a8bbedbd
Submitter: Jenkins
Branch: master

commit 7c210ee68db112bbd16066a54bfdf462a8bbedbd
Author: Alex Schultz <email address hidden>
Date: Fri Sep 25 14:31:11 2015 -0500

    Update OCF stop actions to use procfs

    This change adds a set of fuel functions that can be used to stop
    processes using a pid file or a pid. These functions allow for the same
    method of stopping processes to be used by all ocf scripts. The process
    stopping logic leverages procfs to check to make sure the process has
    been stopped.

    This change adds a default retry of 5 times with a 2 second sleep
    between sending SIGTERM signals to the processes. This retry count is
    dynamic based on the timeout for the stopping command. If the SIGTERM
    does not work, a SIGKILL will also be tried one time.

    This change does not include the ocf script for rabbitmq as that is the
    upstream version and won't be leveraging the ocf-fuel-funcs library.

    This change does not include the ocf script for mysql as that needs
    further work.

    Change-Id: I8dcc1c37d17068a9c29480b886d4f1a051f28894
    Partial-Bug: 1425579

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This change became a dependency for backporting high/crit fixes to OCF RA, therefore must be backported as well

Changed in fuel:
importance: Medium → High
tags: added: customer-found
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/316085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/316803

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/317979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/317979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/316803

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/316085

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.1-updates and 7.0-updates as this is pretty big change

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.