Orphaned LXD VM left behind after creation failure, requires manual cleanup
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MAAS | Status tracked in 3.6 | |||||
3.5 |
Won't Fix
|
Medium
|
Unassigned | |||
3.6 |
Triaged
|
Medium
|
Unassigned |
Bug Description
When MAAS fails to create an LXD VM, currently specifically because the root-disk size was requested as 0 due to Juju Bug #1983084, an orphaned LXD instance is left behind but no such instance exists in MAAS. It requires manual cleanup of the LXD VM.
How to reproduce
1. Deploy MAAS (v3.4)
2. Register an LXD VM host to MAAS
3. Bootstrap a juju controller against the MAAS (Juju v3.3.1)
4. Deploy a charm with storage support, without specifying a "root-disk" constraint - which causes the Bug #1983084 to request a VM with 0 storage:
juju add-model test1
juju deploy ceph-osd -n1 --channel quincy/stable --storage osd-devices=maas,8G
I can't seem to easily recreate this failure using the MAAS CLI, e.g. "maas admin vm-host compose 1 hostname=test3 cores=3 memory=2048 storage=0.0" manages to create and boot an LXD VM with no disk - rather than try to create a disk with size 0. Perhaps someone can figure out an easier way to reproduce a failure without the juju complexity.
From a brief look at the code, it seems maybe there is no cleanup error handling at all? There's no catching of an error during creation and I don't see any other obvious cleanup code. But I only looked briefly and may have missed something at a higher level:
https:/
The secondary storage volume is also left attached. Juju also retries 10 times so every time this happens you end up with 10 VMs left behind.
Looking at this a bit further, the LXD VM is actually created but perhaps not funtional or fails to start? But there is no logging from the LXD client with snap debugging turned on, and I can't actually see any obvious specific error or cause for why the transaction is ultimately rolled back.
Possibly a variant of Bug #2028284?
2024-02-28 07:39:27 django.db.backends: [debug] (0.000) ROLLBACK TO SAVEPOINT "s1401247330442 88_x26" ; args=None 88_x26" ; args=None ####### ####### ####### #### Exception: No available machine matches constraints: [('agent_name', ['20d71182- 29d5-4b3e- 8125-7789ad7a60 b8']), ('arch', ['amd64']), ('interfaces', ['1:space=1']), ('storage', ['root:0,0:8']), ('zone', ['default'])] (resolved to "arch=amd64/generic interfaces= 1:space= 1 storage=root:0,0:8 zone=default") ####### ####### ####### ####### #### maas/32469/ usr/lib/ python3/ dist-packages/ django/ core/handlers/ base.py" , line 181, in _get_response callback( request, *callback_args, **callback_kwargs) maas/32469/ lib/python3. 10/site- packages/ maasserver/ utils/views. py", line 298, in view_atomic_ with_post_ commit_ savepoint python3. 10/contextlib. py", line 79, in inner maas/32469/ lib/python3. 10/site- packages/ maasserver/ api/support. py", line 62, in __call__ .__call_ _(request, *args, **kwargs) maas/32469/ usr/lib/ python3/ dist-packages/ django/ views/decorator s/vary. py", line 20, in inner_func maas/32469/ usr/lib/ python3. 10/dist- packages/ piston3/ resource. py", line 197, in __call__ handler( e, request, meth, em_format) maas/32469/ usr/lib/ python3. 10/dist- packages/ piston3/ resource. py", line 195, in __call__ maas/32469/ lib/python3. 10/site- packages/ maasserver/ api/support. py", line 371, in dispatch maas/32469/ lib/python3. 10/site- packages/ maasserver/ api/machines. py", line 2608, in allocate le(message) exceptions. NodesNotAvailab le: No available machine matches constraints: [('agent_name', ['20d71182- 29d5-4b3e- 8125-7789ad7a60 b8']), ('arch', ['amd64']), ('interfaces', ['1:space=1']), ('storage', ['root:0,0:8']), ('zone', ['default'])] (resolved to "arch=amd64/generic interfaces= 1:space= 1 storage=root:0,0:8 zone=default")
2024-02-28 07:39:27 django.db.backends: [debug] (0.000) RELEASE SAVEPOINT "s1401247330442
2024-02-28 07:39:27 maasserver: [error] #######
2024-02-28 07:39:27 maasserver: [error] Traceback (most recent call last):
File "/snap/
response = wrapped_
File "/snap/
return view_atomic(*args, **kwargs)
File "/usr/lib/
return func(*args, **kwds)
File "/snap/
response = super()
File "/snap/
response = func(*args, **kwargs)
File "/snap/
result = self.error_
File "/snap/
result = meth(request, *args, **kwargs)
File "/snap/
return function(self, request, *args, **kwargs)
File "/snap/
raise NodesNotAvailab
maasserver.