Convergence: worker fails to re-trigger new traversal on update-replace
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
Fix Released
|
High
|
Anant Patil |
Bug Description
Functional test in https:/
Consider the order of events below:
1. A server is being updated. The worker locks the server resource.
2. A rollback is triggered because some one cancelled the stack.
3. As part of rollback, new update using old template is started.
4. The new update tries to take the lock but it has been already
acquired in (1). The new update now expects that the when the old
resource is done, it will re-trigger the new traversal.
5. The old update decides to create a new resource for replacement. The
replacement resource is initiated for creation, a check_resource RPC
call is made for new resource.
6. A worker, possibly in another engine, receives the call and then it
bails out when it finds that there is a new traversal initiated (from
2). Now, there is no progress from here because it is expected (from 4)
that there will be a re-trigger when the old resource is done.
Changed in heat: | |
assignee: | nobody → Anant Patil (ananta) |
Changed in heat: | |
status: | New → In Progress |
tags: | added: newton-rc-potential |
Changed in heat: | |
importance: | Undecided → High |
milestone: | none → newton-rc2 |
Reviewed: https:/ /review. openstack. org/371572 /git.openstack. org/cgit/ openstack/ heat/commit/ ?id=99b055b4235 7e2fae6006fe150 c3c47c30dab1c0
Committed: https:/
Submitter: Jenkins
Branch: master
commit 99b055b42357e2f ae6006fe150c3c4 7c30dab1c0
Author: Anant Patil <email address hidden>
Date: Fri Sep 16 14:13:57 2016 +0000
Re-trigger on update-replace
It is found that the inter-leaving of lock when a update-replace of a
resource is needed is the reason for new traversal not being triggered.
Consider the order of events below:
1. A server is being updated. The worker locks the server resource.
2. A rollback is triggered because some one cancelled the stack.
3. As part of rollback, new update using old template is started.
4. The new update tries to take the lock but it has been already
acquired in (1). The new update now expects that the when the old
resource is done, it will re-trigger the new traversal.
5. The old update decides to create a new resource for replacement. The
replacement resource is initiated for creation, a check_resource RPC
call is made for new resource.
6. A worker, possibly in another engine, receives the call and then it
bails out when it finds that there is a new traversal initiated (from
2). Now, there is no progress from here because it is expected (from 4)
that there will be a re-trigger when the old resource is done.
This change takes care of re-triggering the new traversal from worker
when it finds that there is a new traversal and an update-replace. Note
that this issue will not be seen when there is no update-replace
because the old resource will finish (either fail or complete) and in
the same thread it will find the new traversal and trigger it.
Closes-Bug: #1625073 45cd85a9721937d a2f4ac304e0
Change-Id: Icea5ba498ef8ca