Intermittent DB failure when creating VM pods via post /MAAS/api/2.0/machines/?op=allocate
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
High
|
Unassigned |
Bug Description
MAAS: 2.6.0 (7802-g59416a86
Whenever I boot multiple VM pods at the same time using "allocate" operation, I get some VMs failing with overall message on Juju as:
3 down pending bionic suitable availability zone for machine 3 not found
Looking into MAAS source code, I can see decision on which pod will receive corresponding VM is done on method:
https:/
Added more logs to the "except" statement and found out we're throwing OperationalError. I believe it corresponds to a resource dispute on database, since error message is: "could not serialize access due to concurrent update", which is also documented on Postgres: https:/
Full traceback of OperationalError:
Traceback (most recent call last):
File "/usr/lib/
return self.cursor.
psycopg2.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/
creation_
File "/usr/lib/
return create_
File "/usr/lib/
self.
File "/usr/lib/
hints.save()
File "/usr/lib/
return super(CleanSave, self).save(*args, **kwargs)
File "/usr/lib/
force_
File "/usr/lib/
updated = self._save_
File "/usr/lib/
update_
File "/usr/lib/
forced_update)
File "/usr/lib/
return filtered.
File "/usr/lib/
return query.get_
File "/usr/lib/
cursor = super(SQLUpdate
File "/usr/lib/
raise original_exception
File "/usr/lib/
cursor.
File "/usr/lib/
return super()
File "/usr/lib/
return self.cursor.
File "/usr/lib/
six.
File "/usr/lib/
raise value.with_
File "/usr/lib/
return self.cursor.
django.
Related branches
- MAAS Lander: Needs Fixing
- Newell Jensen (community): Approve
-
Diff: 306 lines (+121/-19)2 files modifiedsrc/maasserver/forms/pods.py (+46/-7)
src/maasserver/forms/tests/test_pods.py (+75/-12)
description: | updated |
Changed in maas: | |
status: | New → In Progress |
importance: | Undecided → High |
milestone: | none → 2.7.0alpha1 |
assignee: | nobody → Blake Rouse (blake-rouse) |
Changed in maas: | |
milestone: | 2.7.0b1 → 2.7.0b2 |
Changed in maas: | |
milestone: | 2.7.0b2 → none |
Changed in maas: | |
assignee: | Blake Rouse (blake-rouse) → nobody |
status: | In Progress → Triaged |
I can reproduce this issue using Juju with following bundle: https:/ /pastebin. ubuntu. com/p/NywWq5SPT m/ (needs to create an OAM network with oam-space name)