Comment 4 for bug 2034014

Revision history for this message
Jacopo Rota (r00ta) wrote :

I've managed to debug the issue and I confirm that it takes ages for the transaction to commit: this is why the lock is not released and the other "allocate" calls are stuck at the lock-entry.

23-09-05 14:23:59 maasserver.api.machines: [info] TRYING TO GET LOCK
2023-09-05 14:23:59 maasserver.api.machines: [info] LOCK ACQUIRED
23-09-05 14:23:59 maasserver.api.machines: [info] TRYING TO GET LOCK
2023-09-05 14:23:59 maasserver.api.machines: [info] TRYING TO GET LOCK
2023-09-05 14:23:59 maasserver.api.machines: [info] TRYING TO GET LOCK
2023-09-05 14:23:59 maasserver.utils.dblocks: [info] MAAS BUSINESS DONE, TRANSACTION TO BE COMMITTED
... SOME HTTP CALLS IN THE MEANWHILE
2023-09-05 14:24:22 maasserver.utils.views: [info] TRANSACTION COMMITTED
2023-09-05 14:24:22 maasserver.api.machines: [info] LOCK ACQUIRED
....

as you can see it takes more than 20 seconds to commit the transaction for the first request. All the retries will hit the same issue and only one at a time will commit the transaction very slowly. At the end of the timeout (90 seconds) all the remaining request will get 409.

Unfortunately there is not much that we can do on this bug right now: the performance issue will be addressed properly in the next roadmap cycles. Once the performance improvements will be there, your machine should be enough to handle your machines. But until then, I think the only way is to upgrade to a more powerful machine and/or add a region controller and setup HA proxy.