The Schedule Operation does not trigger using operationengine multiple- node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Karbor |
Invalid
|
Undecided
|
Unassigned |
Bug Description
- Description:
After ScheduleOperation creation if you set a date-time to run this operation it doesn't trigger if the node that takes the trigger operation task, isn't where the operation was created.
The function that executes the Operation gets the operation from a set (_operation_ids) saved in Memory, So only the node where the operation was created can execute it. For others node the set (_operation_ids) is empty so, those can't execute Operation.
- code:
```def _trigger_
"""Trigger operations once"""
# The executor execute_operation may have I/O operation.
# If it is, this green thread will be switched out during looping
# operation_ids. In order to avoid changing self._operation_ids
# during the green thread is switched out, copy self._operation_ids
# as the iterative object.
trigger = cls._triggers.
if not trigger:
return
########set in Memory -> trigger.
sent_ops = set()
end_time = expect_run_time + timedelta(
for operation_id in operations_ids:
if operation_id not in trigger.
# Maybe, when traversing this operation_id, it has been
# removed by self.unregister
now = datetime.utcnow()
if now >= end_time:
try:
except Exception:
- Environment condition:
* karbor version: master (I think all versions has this problem)
* Operation Engine service running on 3 nodes
- Test to reproduce it:
Create Karbor plan and trigger
karbor plan-create "myplan001" ac9b986e-
+------
| Property | Value |
+------
| description | None |
| id | df1c0584-
| name | myplan001 |
| parameters | {} |
| provider_id | ac9b986e-
| resources | [ |
| | { |
| | "id": "1a1441d3-
| | "name": "karbor_keep_test", |
| | "type": "OS::Nova::Server" |
| | } |
| | ] |
| status | suspended |
+------
karbor trigger-create 'mytrigger001' 'time' "pattern"
+------
| Property | Value |
+------
| id | ab103b0e-
| name | mytrigger001 |
| properties | { |
| | "format": "calendar", |
| | "pattern": "BEGIN:
| | "start_time": "2019-09-19 17:20:00" |
| | } |
| type | time |
+------
Monitoring the operation engine log (debug mode)
Every nodes shows the same log they are in the infinity loop
2019-09-19 17:08:47.092 8 DEBUG karbor.
2019-09-19 17:09:02.092 8 DEBUG karbor.
Create the schedule operation based on the created plan and trigger
karbor scheduledoperat
+------
| Property | Value |
+------
| description | None |
| enabled | True |
| id | b5f4401c-
| name | myprotection_
| operation_
| | "plan_id": "df1c0584-
| | "provider_id": "ac9b986e-
| | } |
| operation_type | retention_protect |
| trigger_id | ab103b0e-
+------
Continue monitoring the logs and the operation was created on Node2 (a simple name to call one of the nodes)
2019-09-19 17:13:55.041 8 DEBUG karbor.
2019-09-19 17:13:59.769 8 DEBUG karbor.
2019-09-19 17:14:10.039 8 DEBUG karbor.
2019-09-19 17:14:25.039 8 DEBUG karbor.
Continue monitoring the operation engine service logs,
All nodes are in the infinitive loop.
Waiting the setted time in the trigger to see the execution of the operation....
2019-09-19 17:17:47.106 8 DEBUG karbor.
2019-09-19 17:18:02.107 8 DEBUG karbor.
At 2019-09-19 17:20:02 the Node3 took the created trigger to start the schedule operation..., see the following log, You can see that the set _operation_ids is empty, cause the schedule operation was created on a different node (Node2).
NOTE: I added some lines in the code to log the trace and understand this behavior
2019-09-19 17:19:47.113 8 DEBUG karbor.
2019-09-19 17:20:02.116 8 DEBUG karbor.
2019-09-19 17:20:02.118 8 DEBUG karbor.
2019-09-19 17:20:02.129 8 DEBUG karbor.
2019-09-19 17:20:02.131 8 INFO karbor.
2019-09-19 17:20:02.131 8 INFO karbor.
2019-09-19 17:20:02.132 8 INFO karbor.
2019-09-19 17:20:02.144 8 DEBUG karbor.
2019-09-19 17:20:17.113 8 DEBUG karbor.
- Workaround
Enable the schedule operation service only on one node, stop it on others.
Changed in karbor: | |
status: | New → Invalid |