Selecting LXD KVM Causes the machine to get stuck in Deploying Reboot

Bug #2052480 reported by Alan Baghumian
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Stamatis Katsaounis
lxd
Fix Released
Unknown

Bug Description

Hello MAAS Team,

While deploying a new MAAS cluster, we noticed our nodes are getting stuck in Deploying/Reboot, if LXD is selected as the KVM pod type.

MAAS Version: 3.3.5 Snaps from 3.3/stable channel (3.3.5-13222-g.78dd996c0)

Deployed OS: We tested both 22.04 (Jammy) and 20.04 (Focal) which produced the same results.

Workaround:

(1) From MAAS WebUI, go to KVM then click "Add LXD host".
(2) Complete the form and capture the generated certificate.
(3) SSH to the deploying machine and import the certificate.
(4) Back on the MAAS WebUI, complete the setup.
(5) This sets the machine in "Deployed"mode and enable creating new VM(s) using the WebUI.

More Details:

(A) It does seem that a LXD certificate was successfully created during the deployment:

ubuntu@os-node-1:~$ sudo lxc config trust ls
+--------+---------+-------------+--------------+-----------------------------+-----------------------------+
| TYPE | NAME | COMMON NAME | FINGERPRINT | ISSUE DATE | EXPIRY DATE |
+--------+---------+-------------+--------------+-----------------------------+-----------------------------+
| client | lxd.crt | Home-Lab | d3cf1ca2149f | Feb 5, 2024 at 5:01pm (UTC) | Feb 2, 2034 at 5:01pm (UTC) |
+--------+---------+-------------+--------------+-----------------------------+-----------------------------+

(B) Looking into the regiond.log, this is visible (Both Jammy and Focal deployments show the same trace):

2024-02-05 17:02:34 metadataserver.api_twisted: [critical] Failed to process status message instantly.
 Traceback (most recent call last):
   File "/usr/lib/python3.10/threading.py", line 953, in run
     self._target(*self._args, **self._kwargs)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 822, in worker
     return target()
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/_threads/_threadworker.py", line 47, in work
     task()
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/_threads/_team.py", line 182, in doWork
     task()
 --- <exception caught here> ---
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
     result = inContext.theWork() # type: ignore[attr-defined]
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
     inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/snap/maas/32636/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
     return func(*args, **kw)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 857, in callInContext
     return func(*args, **kwargs)
   File "/snap/maas/32636/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
     result = func(*args, **kwargs)
   File "/snap/maas/32636/lib/python3.10/site-packages/metadataserver/api_twisted.py", line 593, in _processMessageNow
     self._processMessage(node, message)
   File "/snap/maas/32636/lib/python3.10/site-packages/maasserver/utils/orm.py", line 771, in call_within_transaction
     return func_outside_txn(*args, **kwargs)
   File "/snap/maas/32636/lib/python3.10/site-packages/maasserver/utils/orm.py", line 574, in retrier
     return func(*args, **kwargs)
   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
     return func(*args, **kwds)
   File "/snap/maas/32636/lib/python3.10/site-packages/metadataserver/api_twisted.py", line 501, in _processMessage
     _create_vmhost_for_deployment(node)
   File "/snap/maas/32636/lib/python3.10/site-packages/metadataserver/api_twisted.py", line 264, in _create_vmhost_for_deployment
     discover_and_sync_vmhost(pod, node.owner)
   File "/snap/maas/32636/lib/python3.10/site-packages/maasserver/vmhost.py", line 88, in discover_and_sync_vmhost
     raise PodProblem(str(error)) from error
 maasserver.exceptions.PodProblem: Failed talking to pod: Certificate is restricted

Please review and let me know if you need additional testing or logs.

Best,
Alan

Revision history for this message
Alan Baghumian (alanbach) wrote :
Changed in maas:
status: New → Triaged
Changed in maas:
importance: Undecided → High
Changed in lxd:
status: Unknown → Fix Released
Changed in maas:
assignee: nobody → Stamatis Katsaounis (skatsaounis)
status: Triaged → In Progress
Revision history for this message
Pablo F Ordonez (pordonez) wrote :
Download full text (6.1 KiB)

I updated to LXD to 5.20-f3dd836 and it solve the problem of "Certified Restricted". However, when I try to create a LXD vm I got another issue:

from rackd.log

2024-02-09 17:27:47 provisioningserver.rpc.pods: [critical] server4: Failed to compose machine: RequestedMachine(hostname='nasrin', architecture='amd64/generic', cores=4, memory=20480, block_devices=[RequestedMachineBlockDevice(size=195000000000, tags=['pool1'])], interfaces=[RequestedMachineInterface(ifname=None, attach_name=None, attach_type=None, attach_options=None, attach_vlan=None, requested_ips=[], ip_mode=None)], cpu_speed=None, known_host_interfaces=[], pinned_cores=[], hugepages_backed=False)
 Traceback (most recent call last):
   File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
     self.run()
   File "/usr/lib/python3.10/threading.py", line 953, in run
     self._target(*self._args, **self._kwargs)
   File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/_threads/_threadworker.py", line 47, in work
     task()
   File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/_threads/_team.py", line 182, in doWork
     task()
 --- <exception caught here> ---
   File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
     result = inContext.theWork() # type: ignore[attr-defined]
   File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
     inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
   File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
     return self.currentContext().callWithContext(ctx, func, *args, **kw)
   File "/snap/maas/32469/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
     return func(*args, **kw)
   File "/snap/maas/32469/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
     result = func(*args, **kwargs)
   File "/snap/maas/32469/lib/python3.10/site-packages/provisioningserver/drivers/pod/lxd.py", line 440, in compose
     **self._get_machine_nics(request),
   File "/snap/maas/32469/lib/python3.10/site-packages/provisioningserver/drivers/pod/lxd.py", line 526, in _get_machine_nics
     raise LXDPodError("No host network to attach VM interfaces to")
 provisioningserver.drivers.pod.lxd.LXDPodError: No host network to attach VM interfaces to

Additional Info

lxc network ls
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| NAME | TYPE | MANAGED | IPV4 | IPV6 | DESCRIPTION | USED BY | STATE |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| br-eno1 | bridge | NO | | | | 2 | |
+----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| br-eno2 | bridge | NO | | | | 1 | |
+----------+----------+-------...

Read more...

Revision history for this message
Stamatis Katsaounis (skatsaounis) wrote :

Hi Pablo, your issue is a different one. I would propose to raise a separate bug report and continue the discussion there. Regarding this bug here, it was a side-effect of the linked LXD issue. After its release, the MAAS issue is gone. As such, I am setting this one to Invalid.

Changed in maas:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.