Bulk operation leaves nodes in inconsistent state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Graham Binns |
Bug Description
I tested this in the UI but I think the same goes for the API: when performing a bulk operation, the operation only succeeds if all the sub operations succeed. This leads to:
- poor user experience: if I perform 4 operations and one of them fails all the other operations will be rolled back
- possibly leaving nodes in an inconsistent state: if only one in 4 operations fails, the power operations of the remaining 3 operations will still be performed although the DB side of things will be rolled back. (This is because the RPC operations —the power actions in this instance— are not part of the transaction and thus don't get rolled back).
= How to reproduce =
- Get two nodes 'READY'
- Commission one node
- Wait 20 seconds: enough so that the power operation is done (and the lock removed) but not enough to get the commissioning operation done
- Commission the second node
- Quickly after this (i.e. while the lock of the second node is still being held): abort the commissioning operation of the two nodes
- The bulk operation will fail (because of the lock held on the second node) but the first node will also be powered down.
=> the second node is now in an inconsistent state: it's commissioning but it has been powered down.
Related branches
- Raphaël Badin (community): Approve
- Gavin Panella (community): Approve
-
Diff: 508 lines (+368/-17)2 files modifiedsrc/maasserver/models/node.py (+76/-13)
src/maasserver/models/tests/test_node.py (+292/-4)
Changed in maas: | |
milestone: | none → next |
Changed in maas: | |
status: | Triaged → Fix Committed |
assignee: | nobody → Graham Binns (gmb) |
milestone: | next → 1.7.0 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
I think the semantics of a bulk operation should be: similar to what deferredList does (with consumeErrors= True): a bulk operation should always succeed and report back about the number of suboperations that succeeded/failed.
Under the hood, we can probably achieve this by using sub transactions.