Convergence: worker fails to re-trigger new traversal on update-replace

Bug #1625073 reported by Anant Patil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Anant Patil

Bug Description

Functional test in https://review.openstack.org/#/c/306490/ is failing because a new update (as a consequence of rollback due to cancel-update) is failing to re-trigger new traversal during the time when a resource is being replaced.

Consider the order of events below:
    1. A server is being updated. The worker locks the server resource.
    2. A rollback is triggered because some one cancelled the stack.
    3. As part of rollback, new update using old template is started.
    4. The new update tries to take the lock but it has been already
    acquired in (1). The new update now expects that the when the old
    resource is done, it will re-trigger the new traversal.
    5. The old update decides to create a new resource for replacement. The
    replacement resource is initiated for creation, a check_resource RPC
    call is made for new resource.
    6. A worker, possibly in another engine, receives the call and then it
    bails out when it finds that there is a new traversal initiated (from
    2). Now, there is no progress from here because it is expected (from 4)
    that there will be a re-trigger when the old resource is done.

Anant Patil (ananta)
Changed in heat:
assignee: nobody → Anant Patil (ananta)
Changed in heat:
status: New → In Progress
Zane Bitter (zaneb)
tags: added: newton-rc-potential
Changed in heat:
importance: Undecided → High
milestone: none → newton-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/371572
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=99b055b42357e2fae6006fe150c3c47c30dab1c0
Submitter: Jenkins
Branch: master

commit 99b055b42357e2fae6006fe150c3c47c30dab1c0
Author: Anant Patil <email address hidden>
Date: Fri Sep 16 14:13:57 2016 +0000

    Re-trigger on update-replace

    It is found that the inter-leaving of lock when a update-replace of a
    resource is needed is the reason for new traversal not being triggered.

    Consider the order of events below:
    1. A server is being updated. The worker locks the server resource.
    2. A rollback is triggered because some one cancelled the stack.
    3. As part of rollback, new update using old template is started.
    4. The new update tries to take the lock but it has been already
    acquired in (1). The new update now expects that the when the old
    resource is done, it will re-trigger the new traversal.
    5. The old update decides to create a new resource for replacement. The
    replacement resource is initiated for creation, a check_resource RPC
    call is made for new resource.
    6. A worker, possibly in another engine, receives the call and then it
    bails out when it finds that there is a new traversal initiated (from
    2). Now, there is no progress from here because it is expected (from 4)
    that there will be a re-trigger when the old resource is done.

    This change takes care of re-triggering the new traversal from worker
    when it finds that there is a new traversal and an update-replace. Note
    that this issue will not be seen when there is no update-replace
    because the old resource will finish (either fail or complete) and in
    the same thread it will find the new traversal and trigger it.

    Closes-Bug: #1625073
    Change-Id: Icea5ba498ef8ca45cd85a9721937da2f4ac304e0

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/373614

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/newton)

Reviewed: https://review.openstack.org/373614
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=c6bc3fef71a9c46a4109d6abe4d4d4923c7bdae9
Submitter: Jenkins
Branch: stable/newton

commit c6bc3fef71a9c46a4109d6abe4d4d4923c7bdae9
Author: Anant Patil <email address hidden>
Date: Fri Sep 16 14:13:57 2016 +0000

    Re-trigger on update-replace

    It is found that the inter-leaving of lock when a update-replace of a
    resource is needed is the reason for new traversal not being triggered.

    Consider the order of events below:
    1. A server is being updated. The worker locks the server resource.
    2. A rollback is triggered because some one cancelled the stack.
    3. As part of rollback, new update using old template is started.
    4. The new update tries to take the lock but it has been already
    acquired in (1). The new update now expects that the when the old
    resource is done, it will re-trigger the new traversal.
    5. The old update decides to create a new resource for replacement. The
    replacement resource is initiated for creation, a check_resource RPC
    call is made for new resource.
    6. A worker, possibly in another engine, receives the call and then it
    bails out when it finds that there is a new traversal initiated (from
    2). Now, there is no progress from here because it is expected (from 4)
    that there will be a re-trigger when the old resource is done.

    This change takes care of re-triggering the new traversal from worker
    when it finds that there is a new traversal and an update-replace. Note
    that this issue will not be seen when there is no update-replace
    because the old resource will finish (either fail or complete) and in
    the same thread it will find the new traversal and trigger it.

    Closes-Bug: #1625073
    Change-Id: Icea5ba498ef8ca45cd85a9721937da2f4ac304e0
    (cherry picked from commit 99b055b42357e2fae6006fe150c3c47c30dab1c0)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 7.0.0.0rc2

This issue was fixed in the openstack/heat 7.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 7.0.0

This issue was fixed in the openstack/heat 7.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 8.0.0.0b1

This issue was fixed in the openstack/heat 8.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.