[fuel-library] Some pacemaker location constraints are missing after deployment

Bug #1396481 reported by Aleksandr Didenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Vladimir Kuklin
5.1.x
Fix Committed
Critical
Vladimir Kuklin
6.0.x
Fix Released
Critical
Vladimir Kuklin

Bug Description

{
    "api": "1.0",
    "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280",
    "auth_required": true,
    "build_id": "2014-11-24_22-41-00",
    "build_number": "4",
    "feature_groups": [
        "mirantis"
    ],
    "fuellib_sha": "893883f7fa8ffc5dde975b6806e538a11969a15b",
    "fuelmain_sha": "45b21f7bdb061b59b80f8d126d9a6f6e50505a0d",
    "nailgun_sha": "603a8d438dc7a3cf6286eb9f16deb8137f47d703",
    "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622",
    "production": "docker",
    "release": "6.0",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "api": "1.0",
                "astute_sha": "c15623d05ccdf7ac10873e7a90df954de8726280",
                "build_id": "2014-11-24_22-41-00",
                "build_number": "4",
                "feature_groups": [
                    "mirantis"
                ],
                "fuellib_sha": "893883f7fa8ffc5dde975b6806e538a11969a15b",
                "fuelmain_sha": "45b21f7bdb061b59b80f8d126d9a6f6e50505a0d",
                "nailgun_sha": "603a8d438dc7a3cf6286eb9f16deb8137f47d703",
                "ostf_sha": "a35f516f1606b0d03d51ff63bfe3fbe23de4b622",
                "production": "docker",
                "release": "6.0"
            }
        }
    }
}

Systest group "deploy_ha_neutron" on Ubuntu:
3 controller, 2 computes, 1 cinder LVM, Neutron with GRE seg

After deployment "pcs constraint list" shows:

  Resource: ping_vip__public
    Enabled on: node-1 (score:100)
    Enabled on: node-5 (score:100)
  Resource: vip__management
    Enabled on: node-1 (score:100)
    Enabled on: node-5 (score:100)
  Resource: vip__public
    Enabled on: node-1 (score:100)
    Enabled on: node-2 (score:100)

According to puppet logs needed pcs commands were executed but location is still missing in CIB, for example vip__management on node-2:

Tue Nov 25 16:42:57 +0000 2014 Puppet (debug): Executing '/usr/sbin/pcs constraint location add vip__management_on_node-2 vip__management node-2 100'
Tue Nov 25 16:43:01 +0000 2014 Puppet (debug): Executing '/usr/sbin/pcs resource clear vip__management node-2'
Tue Nov 25 16:43:10 +0000 2014 Puppet (debug): Executing '/usr/sbin/pcs resource meta vip__management target-role=Started'

root@node-2:~/logs# cibadmin -Q | grep vip__management_on_node-
      <rsc_location id="vip__management_on_node-1" node="node-1" rsc="vip__management" score="100"/>
      <rsc_location id="vip__management_on_node-5" node="node-5" rsc="vip__management" score="100"/>

Revision history for this message
Aleksandr Didenko (adidenko) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Attaching some additional info/logs

description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/137655

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Dmitry Ilyin (<email address hidden>) on branch: master
Review: https://review.openstack.org/137655

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/134964
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c045ce3078c3d5e0d5df786041875332b8f3fac2
Submitter: Jenkins
Branch: master

commit c045ce3078c3d5e0d5df786041875332b8f3fac2
Author: Dmitry Ilyin <email address hidden>
Date: Thu Nov 27 18:11:16 2014 +0300

    Fix idempotency of cs_resource

    * insync? to drop status metadata from checks
    * code cleanup
    * fix rspec for cs_resource type
    * switch location add implementation from pcs
      to cibadmin --patch to solve problems with
      cib changes not being synced to other nodes

    related-blueprint: pacemaker-improvements
    Related-Bug: 1391599
    Related-Bug: 1390480
    Related-Bug: 1396481

    Change-Id: I5410b91ea01fc8c6805de6becdf0800d0d486188
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I testes looped deployment of cluster of 5 nodes. And there was 5 failures with missing some constraint from total of 128 runs.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Investigating the same for 5.1.2. Will update 5.1.2 status once I have a deployment stats

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/138067

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/138067
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=fe097a2f36524ddd07039697a03149c05c9913fb
Submitter: Jenkins
Branch: master

commit fe097a2f36524ddd07039697a03149c05c9913fb
Author: Dmitry Ilyin <email address hidden>
Date: Mon Dec 1 16:55:41 2014 +0300

    Change is_online? to use dc-version

    Use dc-version to determine if cib is ready to
    work with.

    Change-Id: I4bf0e4f63b45c75f37709b2c5e54d830281742b2
    Related-Bug: 1396481

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/138167

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/138385

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/5.1)

Related fix proposed to branch: stable/5.1
Review: https://review.openstack.org/138398

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

The problem is due to race condition with happening while controllers are being deployed in parallel.

Apply_changes method in corosync.rb module for pacemaker providers utilitizes pacemaker shadows, thus it becomes sensitive to manipulation of the same part of the CIB simultaneously. In 5.1.1 and 6.0 we added location used by ping service to identify the best location for public virtual IP. In this case deployment may result in simultaneous modification of location section of CIB both by pacemaker cs_location and service providers. In order to fix it, we need to fix method that we use to apply patches to CIB along with cs_location resource idempotency. Changes above introduce a workaround that makes cs_location resource 'one-shot' resource, i.e. it will not be recreated if there is a location resource with the same name. It satisfies our requirements for now, but the bug should be then fixed as a part of overall refactoring of HA deployment.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
tags: added: release-notes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/138415

Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/138416

no longer affects: fuel/6.1.x
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

We elaborated newer fix that fixes race condition in apply_changes method and generates xml patch based on the old version of CIB that could not be modified by the other node.

Nevertheless, we found that this bug happens rarely (~3%) of deployments when user deploys 5 controllers and thus 4 of them are deployed in parallel. In this case, the workaround is to check location constraints for vip__public|management resources and add them for missing nodes. Also, another workaround is to deploy controllers in portions of not more than 2 nodes in parallel.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

> I testes looped deployment of cluster of 5 nodes. And there was 5 failures with missing some constraint from total of 128 runs.
To me it seems to be this issue rather High than Critical. There is also workaround, as I understand - just click Deploy button again. Or I'm missing something?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/138415
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=2eaba85606656149db68c86cbe9b5dc9bf471d6f
Submitter: Jenkins
Branch: master

commit 2eaba85606656149db68c86cbe9b5dc9bf471d6f
Author: Vladimir Kuklin <email address hidden>
Date: Tue Dec 2 18:51:28 2014 +0300

    Fix apply_changes function race condition

    Due to an error in logic of apply_changes method
    we had problems with parallel deployment of controllers
    as we were generating XML patch between live CIB and
    shadow generated on the base of (sometimes!) older CIB.

    This could lead to the problem that crm_diff removes
    some of the data from the live CIB as it was not present
    in the old CIB.

    An example of such behaviour is the bug with location
    constraints missing for virtual ip addresses as our
    service provider was creating them in this window
    between shadow creation and retrieving of the live CIB
    thus crm_diff was generating patch removing this location
    and leading to this bug.

    Change-Id: Icb28fe6d90f44084d424b793db792869e0c6c66c
    Closes-bug: #1396481

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/5.1)

Change abandoned by Dmitry Ilyin (<email address hidden>) on branch: stable/5.1
Review: https://review.openstack.org/138398

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Dmitry Ilyin (<email address hidden>) on branch: master
Review: https://review.openstack.org/138385

tags: added: on-verification
Revision history for this message
Anastasia Palkina (apalkina) wrote :
Download full text (3.9 KiB)

Verified on ISO #49

"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "auth_required": true, "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}}}, "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"

Deploy Ubuntu, HA with 5 controllers, 1 compute and 1 cinder. Deployment was successful

root@node-10:~# pcs constraint list
Location Constraints:
  Resource: p_haproxy
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-13 (score:100)
  Resource: p_heat-engine
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-13 (score:100)
    Enabled on: node-10 (score:100)
  Resource: p_mysql
    Enabled on: node-4 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-13 (score:100)
    Enabled on: node-14 (score:100)
  Resource: p_neutron-dhcp-agent
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-13 (score:100)
  Resource: p_neutron-l3-agent
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-13 (score:100)
  Resource: p_neutron-metadata-agent
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-13 (score:100)
  Resource: p_neutron-plugin-openvswitch-agent
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-13 (score:100)
  Resource: p_rabbitmq-server
    Enabled on: node-4 (score:100)
    Enabled on: node-14 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-13 (score:100)
  Resource: ping_vip__public
    Enabled on: node-4 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-13 (score:100)
    Enabled on: node-14 (score:100)
  Resource: vip__management
    Enabled on: node-4 (score:100)
    Enabled on: node-10 (score:100)
    Enabled on: node-11 (score:100)
    Enabled on: node-13 (score:100)
    Enabled on: node-14 (sc...

Read more...

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/138167

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/138416
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ace3185696589cf9dbb4003c56589055d3748c62
Submitter: Jenkins
Branch: stable/5.1

commit ace3185696589cf9dbb4003c56589055d3748c62
Author: Vladimir Kuklin <email address hidden>
Date: Tue Dec 2 18:51:28 2014 +0300

    Fix apply_changes function race condition

    Due to an error in logic of apply_changes method
    we had problems with parallel deployment of controllers
    as we were generating XML patch between live CIB and
    shadow generated on the base of (sometimes!) older CIB.

    This could lead to the problem that crm_diff removes
    some of the data from the live CIB as it was not present
    in the old CIB.

    An example of such behaviour is the bug with location
    constraints missing for virtual ip addresses as our
    service provider was creating them in this window
    between shadow creation and retrieving of the live CIB
    thus crm_diff was generating patch removing this location
    and leading to this bug.

    Change-Id: Icb28fe6d90f44084d424b793db792869e0c6c66c
    Closes-bug: #1396481

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.