Comment 2 for bug 1810972

Revision history for this message
Mike Pontillo (mpontillo) wrote : Re: [2.5] Pod VMs stuck on commissioning

First I looked at the logs from `landscapesql-1`, which (according to your logs) seems to be in Ready state. Indeed, it looks like cloud-init runs and commissions the machine successfully.

2019-01-08T14:53:01+00:00 landscapesql-1 cloud-init[925]: All scripts successfully ran
2019-01-08T14:53:01+00:00 landscapesql-1 cloud-init[925]: Cloud-init v. 18.4-0ubuntu1~18.04.1 finished at Tue, 08 Jan 2019 14:53:01 +0000. Datasource DataSourceMAAS [http://10-244-40-0--21.maas-internal:5248/MAAS/metadata/]. Up 176.97 seconds

Then I looked at the logs for `landscapeha-1`, which seems to still be in 'Commissioning' state. In contrast, there is no rsyslog output, and the machine is terminated by libvirtd less than a second after attempting startup:

2019-01-08 14:49:22.811+0000: starting up libvirt version: 4.0.0, package: 1ubuntu8.6 (Christian Ehrhardt <email address hidden> Fri, 09 Nov 2018 07:42:01 +0100), qemu version: 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.9), hostname: leafeon
...
2019-01-08 14:49:23.162+0000: shutting down, reason=destroyed

However, later in the logs I see the startup message with no corresponding shutdown message, so I don't know if the machine actually booted and is attempting to commission.

Can you see any pattern regarding which hosts failed to commission, or is it random every time? Do these machines have a unique networking or storage configuration? (For example, maybe MAAS has created machines based on interface constraints without a PXE network. I thought we had a separate open bug on that, but I can't find it.)

Are these hosts still in commissioning state according to the MAAS database? What happens if you try to commission them again? Can you use a tool such as virt-manager to browse the hypervisor and determine if anything is suspicious about the VM configuration, such as missing or incorrectly attached NICs, a duplicate MAC address, etc?