senlin

Don't set action to failed if acquire lock failed.

Bug #1648681 reported by Ethan Lynn on 2016-12-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	senlin	Fix Released	Critical	Ethan Lynn

Bug Description

For now, when an action trys to lock a cluster, it will retry several times. After then, it will report failed.

But now our engine can scan and pickup the action from db and then execute it, failed action will not be picked up again if it's because lock failed.

To address this, we can just ignore the acquire lock failed error and leave the action at READY status in db, waiting for next engine to pick it up and executed it.

There are two places need to be changed:
1. let engine pick up a random action instead of the first ready action.
2. let acquire lock failed error pass and ignore it.

Thoughts are welcome.

Ethan Lynn (ethanlynn) on 2016-12-09

Changed in senlin:
assignee:	nobody → Ethan Lynn (ethanlynn)

Yanyan Hu (yanyanhu) on 2016-12-09

Changed in senlin:
status:	New → Triaged
importance:	Undecided → Critical

Revision history for this message

XueFeng Liu (jonnary-liu) wrote on 2016-12-11:

"To address this, we can just ignore the acquire lock failed error and leave the action at READY"

This may cause a problem:A action will without timeout in db layer? Some actions need a timeout I think.

Revision history for this message

Ethan Lynn (ethanlynn) wrote on 2016-12-12:

@xuefeng, yes, we might need to add more info in db to address timeout problem, like retry times. But I haven't figure out that in how many cases that an action will not always acquire a lock.

Actually the case of this issue is:
When multiple actions try to lock a cluster at the same time, some of these actions will failed and won't be executed again.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-12: Fix proposed to senlin (master)

Fix proposed to branch: master
Review: https://review.openstack.org/409805

Changed in senlin:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-13:

Fix proposed to branch: master
Review: https://review.openstack.org/410095

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-14: Fix merged to senlin (master)

Reviewed: https://review.openstack.org/409805
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=f84a1a07dd3c7d33acd04a7869e96ab29a949849
Submitter: Jenkins
Branch: master

commit f84a1a07dd3c7d33acd04a7869e96ab29a949849
Author: Ethan Lynn <email address hidden>
Date: Mon Dec 12 21:51:13 2016 +0800

Lookup a random action to execute

This patch change the behavior of scheduler to random pick up
a 'READY' action instead of the first 'READY' action.

Change-Id: I470c0aa3776f78273f4a4d623d0140c99b92214f
Partial-Bug: #1648681

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-14:

Reviewed: https://review.openstack.org/410095
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=b6e8d758a0e33547a3fd78c434337e80b0a826fa
Submitter: Jenkins
Branch: master

commit b6e8d758a0e33547a3fd78c434337e80b0a826fa
Author: Ethan Lynn <email address hidden>
Date: Tue Dec 13 16:40:09 2016 +0800

Remove retry logic from lock_acquire

No need to retry, just wait for engine to pick action up again.

    The workflow is:
    ActionProc -> action.execute() -> return RES_RETRY
    -> action.set_status -> ignore RES_RETRY and continue

Change-Id: Ic622a79b754131171cb940aa9f31ec5aef11ee47
Closes-Bug: #1648681

Changed in senlin:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-15: Fix included in openstack/senlin 3.0.0.0b2

This issue was fixed in the openstack/senlin 3.0.0.0b2 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.