lxd Virtual Machines are not deployed sometimes.

Bug #1979568 reported by Diego Mascialino
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Medium
Unassigned

Bug Description

We are running our system tests using lxd VMs.

Several (but not all) executions fails on deploying step.
The machine times out with state:

 'status_name': 'Deploying',
 'status_message': 'Rebooting',
 'power_state': 'off'

Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

Please, can you attach the maas logs of one failed execution?

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Diego Mascialino (dmascialino) wrote :

Well.. we have several fialed executions.
Please find attached the SOS report of maas-system-tests/1341/

Changed in maas:
status: Incomplete → Triaged
Revision history for this message
Diego Mascialino (dmascialino) wrote :
Download full text (9.6 KiB)

I found these lxd logs in our jenkins server:

```
141 ubuntu@jenkins-slave-2:~/diego/system-tests$ journalctl -u snap.lxd.daemon.service --since "7 days ago" | grep virtual-machines
Jun 20 23:05:28 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-20T23:05:28Z" level=error msg="Failed to cleanly stop instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm2': device or resource busy" instance=vm2 instanceType=virtual-machine project=default
Jun 20 23:05:28 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-20T23:05:28Z" level=error msg="Failed to restart instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm2': device or resource busy" instance=vm2 instanceType=virtual-machine project=default
Jun 21 01:59:01 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-21T01:59:01Z" level=error msg="Failed to cleanly stop instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 21 01:59:01 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-21T01:59:01Z" level=error msg="Failed to restart instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 21 15:38:54 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-21T15:38:54Z" level=error msg="Failed to cleanly stop instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 21 15:38:54 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-21T15:38:54Z" level=error msg="Failed to restart instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 22 23:56:42 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-22T23:56:42Z" level=error msg="Failed to cleanly stop instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 22 23:56:42 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-22T23:56:42Z" level=error msg="Failed to restart instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 22 23:56:52 jenkins-slave-2 lxd.daemon[3200651]: time="2022-06-22T23:56:52Z" level=error msg="Failed to restart instance" err="Failed unmounting instance: Failed to unmount '/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1': device or resource busy" instance=vm1 instanceType=virtual-machine project=default
Jun 23 23:02:28 j...

Read more...

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

We've seen something like this recently - the VM was stuck in EFI shell and LXD was unable to stop the VM using normal methods. Adding `--force` parameter kills the stuck VM. The system test has been updated.
Is the issue reproducible after this update?

Changed in maas:
importance: High → Medium
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.