Don't set action to failed if acquire lock failed.
Bug #1648681 reported by
Ethan Lynn
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
senlin |
Fix Released
|
Critical
|
Ethan Lynn |
Bug Description
For now, when an action trys to lock a cluster, it will retry several times. After then, it will report failed.
But now our engine can scan and pickup the action from db and then execute it, failed action will not be picked up again if it's because lock failed.
To address this, we can just ignore the acquire lock failed error and leave the action at READY status in db, waiting for next engine to pick it up and executed it.
There are two places need to be changed:
1. let engine pick up a random action instead of the first ready action.
2. let acquire lock failed error pass and ignore it.
Thoughts are welcome.
Changed in senlin: | |
assignee: | nobody → Ethan Lynn (ethanlynn) |
Changed in senlin: | |
status: | New → Triaged |
importance: | Undecided → Critical |
To post a comment you must log in.
"To address this, we can just ignore the acquire lock failed error and leave the action at READY"
This may cause a problem:A action will without timeout in db layer? Some actions need a timeout I think.