if $::hostname == boostrap_node checks are fragile

Bug #1615983 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
High
Michele Baldessari

Bug Description

We currently litter our code with the following if checks:
if $::hostname == downcase($bootstrap_node) {
  $sync_db = true
} else {
  $sync_db = false
}

We do so for a bunch of reasons, but mainly for:
1) Creating pacemaker resources only from a single node
2) Call sync_db commands to create sql tables and fixup sql data in general

When we merged commit b345dbea16ad3edd600c62848d8ee116f4df16ee
"Combine BootstrapNodeDeployment with AllNodesDeployment" we added a bootstrap node id not not only for controller nodes but for each type of nodes.

This is problematic. Imagine the following scenario as an example:
- Both Controllers and Compute nodes are configured to include aodh-api
- Both controllers and computes will include the following snippet:
"""
  if $::hostname == downcase($bootstrap_node) {
    $sync_db = true
  } else {
    $sync_db = false
  }

  if $step >= 3 and $sync_db {
    include ::aodh::db::mysql
  }
"""
- The if statement will hold true for both controller 0 *and* compute 0,
  which means that they can be called at the same time, thereby opening up
  a race condition

summary: - boostrap_node if checks are fragile
+ if $:::hostname == boostrap_node checks are fragile
summary: - if $:::hostname == boostrap_node checks are fragile
+ if $::hostname == boostrap_node checks are fragile
description: updated
description: updated
Changed in tripleo:
status: New → Confirmed
importance: Undecided → High
milestone: none → newton-3
Steven Hardy (shardy)
Changed in tripleo:
milestone: newton-3 → newton-rc1
Changed in tripleo:
milestone: newton-rc1 → newton-rc2
Changed in tripleo:
status: Confirmed → Triaged
Revision history for this message
Steven Hardy (shardy) wrote :

I'm not sure we can solve this in time for the newton release, it's something we'll have to document (e.g in the matrix of services, certain services can only be deployed on exactly one role), then we can look at some more flexible approach during Ocata (which will enable e.g scaling out to add some more aodh-api nodes while leaving the current services running on the ControllerRole.

Changed in tripleo:
milestone: newton-rc2 → ocata-1
Steven Hardy (shardy)
Changed in tripleo:
milestone: ocata-1 → ocata-2
Revision history for this message
Michele Baldessari (michele) wrote :

So I hit this regularly while doing my composable HA architecture work, as soon as I add the same service to two different roles. It breaks right away because puppet refuses to create duplicate
pacemaker resources (because the if ::hostname = bootstrap_id matches on both roles).

I was thinking that we could do this if master check on a per service basis. So for example, for
pacemaker the if would become something like this (pseudo-code):
if ::hostname == hiera(pacemaker_short_node_names)[0] {
  $pacemaker_master = true
} else {
  $pacemaker_master = false
}

this way we would alway have pacemaker_master true only on the very first node of all the nodes providing the service and so we could guarantee that the if is true only on a single node for each service, which should match well the composable roles design.

Would this work or am I missing something?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/408726

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: Triaged → In Progress
Changed in tripleo:
milestone: ocata-2 → ocata-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/412455

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/412455
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=8d796ea0e4afd69f9776d07f491b1a0d83e34128
Submitter: Jenkins
Branch: master

commit 8d796ea0e4afd69f9776d07f491b1a0d83e34128
Author: Michele Baldessari <email address hidden>
Date: Mon Dec 19 14:27:51 2016 +0100

    Add a per service bootstrap node variable

    In order to call commands that need to be run on a single node, we
    create a new per-service variable that will contain the first node of
    each role containing the service.

    Change-Id: I03e8685f939e8ae1fcd8b16883b559615042505d
    Partial-Bug: #1615983

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/421405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/421405
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=bf68fa9683ee30cc2ebdf624f47e82a4fcceb746
Submitter: Jenkins
Branch: master

commit bf68fa9683ee30cc2ebdf624f47e82a4fcceb746
Author: Michele Baldessari <email address hidden>
Date: Tue Jan 17 18:17:38 2017 +0100

    Do not depend on bootstrap_nodeid for any pacemaker profile

    When we create a pacemaker resource it must happen from a single node.
    If it happens from multiple nodes an immediate error will be returned by
    pcs.

    For the pacemaker roles we enforce this by leveraging the recently
    introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us
    the first hostname per-service, regardless of the role.
    (introduced via I03e8685f939e8ae1fcd8b16883b559615042505d)

    With this approach if a pacemaker service belongs to two different
    roles (say role Controller on node A and role galera on node B), it
    will only create the resource from one of the two and not both (which
    would return an error).

    Only setting Partial-Bug for this one, because it addresses the issue
    from the pacemaker resource creation POV (which is always affected). But
    the issue itself is a race that we're theoretically affected by since
    the composable roles work landed. While I have tried to fix the more
    general case in previous attempts, I think it is best if we start a
    discussion on how to fix it, because each approach has a bunch of
    potential drawbacks and is quite invasive on how we do things. A
    discussion slot for this has been proposed for the Atlanta PTG.

    Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a
    Partial-Bug: #1615983

Changed in tripleo:
milestone: ocata-3 → ocata-rc1
Changed in tripleo:
milestone: ocata-rc1 → ocata-rc2
Changed in tripleo:
milestone: ocata-rc2 → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Revision history for this message
Ben Nemec (bnemec) wrote :

Is there any chance of this landing for Pike, or should we defer to Queens? It sounds like it will require non-trivial changes to fix and there are no open reviews yet. At this point in the cycle I'm inclined to think it's not going to make Pike.

Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Changed in tripleo:
milestone: pike-rc2 → queens-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/408726
Reason: Per policy documented here: http://specs.openstack.org/openstack/tripleo-specs/specs/policy/patch-abandomment.html - we decided to abandon this patch. If you want to work on it again, feel free to restore it. Any question about that can be asked on #tripleo.

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

is this still being worked on?

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: wallaby-3 → none
status: In Progress → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.