Dynamic corosync node removal must be safe

Bug #1454617 reported by Bogdan Dobrelya
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Dmitry Ilyin
6.0.x
Invalid
Undecided
Unassigned

Bug Description

Current implementation of dynamic node removal with corosync-cmapctl is unsafe. There is a case when nodes marked in UI for removal, but deploy was run via CLI for all nodes (just for some reason). Given circumstances, pacemaker provider will issue nodes remove commands for CMAP on live nodes, running pacemaker with corosync, which is wrong and never should be allowed. Doing so, would end up with broken corosync cluster.

The solution is to check if the provider are about to delete its own node from CMAP and either to skip this action, or at least stop pacemaker and corosync locally prior to issue any remove actions to cmap tool

The complete solution could be to prohibit (or warn and skip) in CLI the deploy action for nodes marked for removal

Changed in fuel:
milestone: none → 6.1
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Dmitry Ilyin (idv1985)
description: updated
tags: added: module-client pacemaker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/182623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/182623
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=9ab71ac888c63e4e1a40397100d3c261656c1d8a
Submitter: Jenkins
Branch: master

commit 9ab71ac888c63e4e1a40397100d3c261656c1d8a
Author: Dmitry Shulyak <email address hidden>
Date: Wed May 13 14:21:15 2015 +0300

    Deprecate deployment on nodes that are marked for deletion

    Running deployment on nodes that are going to be removed from
    cluster can lead to undesired destructive effects.
    For particular case see related bugs.

    In order to prevent such possibility - deployment action wil be
    deprecated in case some nodes are marked for deletion.

    User then will be able to remove them from deployment group,
    or remove pending_deletion flag and deploy them - if pending_deletion
    was set by mistake

    Change-Id: I3622492ba5eeec3de8300effd0fa98602916601f
    Related-Bug: 1454617
    Closes-Bug: 1454622

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Lowering priority as related fix resolves this case from the orchestration side, making deployed environments barely possible to be hit by this bug.

Changed in fuel:
importance: Critical → Medium
milestone: 6.1 → 7.0
Changed in fuel:
milestone: 7.0 → 6.1-updates
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.