[heat] Creation of stack with WaitCondition/WaitHandle resources periodically fails because of concurrent transactions

Bug #1497273 reported by Anastasia Kuznetsova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Sergey Kraynev
7.0.x
Fix Released
High
Alexey Stupnikov

Bug Description

Steps to reproduce:
1. Create stack using following template:
heat_template_version: 2013-05-23
parameters:
  image:
    type: string
    description: Name of image to use for server
  flavor:
    type: string
    description: Flavor to use for server
  timeout:
    type: number
    description: Timeout for WaitCondition, depends on your image and environment
  network:
    type: string
resources:
  wait_condition:
    type: OS::Heat::WaitCondition
    depends_on: instance1
    properties:
      handle: {get_resource: wait_handle}
      count: 5
      timeout: {get_param: timeout}
  wait_handle:
    type: OS::Heat::WaitConditionHandle
  instance1:
    type: OS::Nova::Server
    properties:
      image: {get_param: image}
      flavor: {get_param: flavor}
      networks: [{network: {get_param: network}}]
      user_data_format: RAW
      user_data:
        str_replace:
          template: |
            wc_notify --data-binary '{"status": "SUCCESS"}'
            wc_notify --data-binary '{"status": "SUCCESS", "reason": "signal2"}'
            wc_notify --data-binary '{"status": "SUCCESS", "reason": "signal3", "data": "data3"}'
            wc_notify --data-binary '{"status": "SUCCESS", "reason": "signal4", "data": "data4"}'
            wc_notify --data-binary '{"status": "SUCCESS", "id": "5"}'
            wc_notify --data-binary '{"status": "SUCCESS", "id": "5"}'
          params:
            wc_notify: { get_attr: ['wait_handle', 'curl_cli'] }
outputs:
  curl_cli:
    value: { get_attr: ['wait_handle', 'curl_cli'] }
  wc_data:
    value: { get_attr: ['wait_condition', 'data'] }
2. Wait for stack creation

Observed result:
Stack creation fails because not all signals were received.
E.g:
test_stack 16 minutes Create Failed Resource CREATE failed: WaitConditionTimeout: resources.wait_condition: 4 of 5 received - Signal 1 received;signal3;signal2;Signal 5 received
wait_condition - 16 minutes Create Failed WaitConditionTimeout: resources.wait_condition: 4 of 5 received - Signal 1 received;signal3;signal2;Signal 5 received
wait_handle 21 minutes Signal Complete Signal: status:SUCCESS reason:Signal 5 received
wait_handle 21 minutes Signal Complete Signal: status:SUCCESS reason:signal4
wait_handle 21 minutes Signal Complete Signal: status:SUCCESS reason:Signal 4 received
wait_handle 21 minutes Signal Complete Signal: status:SUCCESS reason:signal3
wait_handle 21 minutes Signal Complete Signal: status:SUCCESS reason:signal2
wait_handle 21 minutes Signal Complete Signal: status:SUCCESS reason:Signal 1 received

Upstream bug: https://bugs.launchpad.net/heat/+bug/1497274

ENV: FUEL/MOS 7.0, ISO 288

description: updated
Changed in mos:
assignee: nobody → MOS Heat (mos-heat)
milestone: none → 8.0
Revision history for this message
Sergey Kraynev (skraynev) wrote :

Wait a decision about this bug in community.

Changed in mos:
importance: Undecided → High
status: New → Triaged
assignee: MOS Heat (mos-heat) → Sergey Kraynev (skraynev)
Revision history for this message
Sergey Kraynev (skraynev) wrote :

Doc Team: please add release note about this bug. current workaround:

need to add "sleep(1)" commands in template above between each "wc_notify --data-binary .." commands

tags: added: release-notes
tags: added: release-notes-done rn7.0
removed: release-notes
Revision history for this message
Sergey Kraynev (skraynev) wrote :

Related patch is on review in community https://review.openstack.org/#/c/232124/

Changed in mos:
importance: High → Medium
Revision history for this message
Sergey Kraynev (skraynev) wrote :

Move to high, because it's really important for case, when user wants to use several signals

Changed in mos:
importance: Medium → High
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/heat (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Sergey Kraynev <email address hidden>
Review: https://review.fuel-infra.org/15941

Changed in mos:
status: Triaged → In Progress
Revision history for this message
Sergey Kraynev (skraynev) wrote :

Fix is on review in fuel-infra gerrit: https://review.fuel-infra.org/#/c/15941/

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/heat (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/15941
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: a064b0ab2d938971180d17a59f38d8f1fa222342
Author: Sergey Kraynev <email address hidden>
Date: Thu Dec 31 11:54:58 2015

Fix race condition for WaitCondition with several signals

This fix is based on changes introduced in patch
Ibf9dd58a66a77d9ae9d4b519b0f11567977f416c.

Follow changes were done in this patch:
 - add separate exception for Concurent transactions with name.
 - wrapper oslo_db_api.wrap_db_retry was used for metadata_set method.
   It decreases frequency of concurent transactions.
 - added new parameter for metadata_set method - updater.
   When RetryRequest exception is raised, oslo_db_api.wrap_db_retry
   re-call metadata_set method and in this case we need to refresh
   old metadata. It's mostly need for signals without data and id.
   For example:
     A and B signals come in the same moment and both get number 1,
   because metadata was empty. Then during write in db RetryRequest
   exception was raised for signal B. Metadata of this signal stores old
   number - 1. So we should re-calculate this value using new length
   of metadata and set number - 2.
 - According previous item code for calculating metadata of
   wait_condition_handle was moved to separate method -
   refresh_metadata.
 - Same method was implemented for base waitcondition class, which
   returns original metadata without changes.
 - handle_signal method for base waitcondition class was update:
   * all code related with updating and verification metadata was moved
     to _refresh_metadata internal method, which will be called from
     metadata_set method.
   * if/else block was changed to raise error in case, when metadata has
     wrong format, so else section was deleted.
 - metadata_set returns safe_metadata value, which needs for writing
   correct events.
 - corresponding tests for waitcondition resource, which expect several
   signals was added.

Closes-Bug: #1497274
(cherry picked from commit 66940e7ba1b7878774204afd7f1e55ac8e2eb2e5)
Closes-Bug: #1497273

Conflicts:
 heat/engine/resources/openstack/heat/wait_condition_handle.py
Change-Id: Ia25146a742ce79dbb0480d9053131216037e5305

Changed in mos:
status: In Progress → Fix Committed
tags: added: heat
tags: added: area-heat
removed: heat
tags: added: on-verification
Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/heat (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Sergey Kraynev <email address hidden>
Review: https://review.fuel-infra.org/18536

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/heat (9.0/mitaka)

Change abandoned by Sergey Kraynev <email address hidden> on branch: 9.0/mitaka
Review: https://review.fuel-infra.org/18536
Reason: related fixes were merged here https://review.fuel-infra.org/#/c/18631/

Revision history for this message
Vadim Rovachev (vrovachev) wrote :
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/heat (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Sergey Kraynev <email address hidden>
Review: https://review.fuel-infra.org/23135

Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on MOS 7.0 + MU6 updates on cluster without TLS.

Actual results:
Stack is created successfully.

Verification on cluster with enabled TLS is blocked by bug https://bugs.launchpad.net/mos/+bug/1636428. This case will be checked separately when bug 1636428 will be fixed.

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.