[2.5] Registering a KVM host provides insufficient debugging information on failure

Bug #1800573 reported by Mike Pontillo
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
High
Unassigned

Bug Description

After attempting to deploy a KVM host with MAAS, I was presented with a "Failed deployment" message. In the regiond.log, I found the following exception:

https://paste.ubuntu.com/p/4FdFS7W9Yh/

This does not provide enough information to diagnose the problem. We should consider the following:

(1) Is it possible to add additional retries to the KVM host deployment process?
(2) Can we add additional logging in failure cases, such as which IP address was chosen as the pod URL, and the IP addresses configured on the machine?
(3) Should we try additional URLs in case of firewall issues?
(4) Should we consider more opinionated ways to designate network spaces? For example, if we know that a particular space is for network management rather than data plane traffic, that's likely to be a better space to select for network management purposes.

I didn't expect any retries to be necessary here, since we already have a "sleep 10" at the end of the array of "runcmd" options in the cloud-init vendor-data we use for deploying KVM hosts. It may be possible that a different race condition is interfering with the completed deployment, though.

Tags: track
description: updated
Changed in maas:
milestone: 2.5.0rc1 → 2.5.0rc2
Revision history for this message
Thiago Martins (martinx) wrote :

I just installed MaaS 2.5 beta 4, I can deploy Ubuntu 18.04 Machines normally, but, when I click to deploy the machine "As a KVM Host", it fails and I can't easily find why.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Debugging ifnormation should be available in:

1. On the machine itself: /var/log/cloud-init.log and /var/log/cloud-init-output.log
2. On MAAS: /var/log/maas/rsyslog/<machine-name>/<date>/messages

Revision history for this message
Thiago Martins (martinx) wrote :

I just found the problem!

https://paste.ubuntu.com/p/kWQd6Jwvrz/ - line 22

The generated Netplan yaml file is broken, no IPs!

---
    bridges:
        br-eno1:
            addresses:
            - None/22
            gateway4: 192.168.4.1
---

I'll post the logs soon!

Revision history for this message
Thiago Martins (martinx) wrote :

Netplan file

Revision history for this message
Thiago Martins (martinx) wrote :

The cloud-init.log

Revision history for this message
Thiago Martins (martinx) wrote :

And the cloud-init-output.log.

Revision history for this message
Thiago Martins (martinx) wrote :

It works with basic network topology (i.e., no BOND), however, when there is a BOND channel, the deployment fail again but, in a different stage.

So, here is what I'm trying to do (worth to mention that KVM Host deployments works without bond and with static IP on PXE network).

First try with manually configured BOND via MaaS Interfaces page, worked:
https://imgur.com/a/d2woB5f

Now, machine Released, trying as KVM Host on top of same BOND that just worked (above), but, failed later on:
https://imgur.com/a/sX0vD0g

It failed! But looks like that the machine was deployed correctly:
https://imgur.com/a/92OJeFq

Weird...

Revision history for this message
Thiago Martins (martinx) wrote :

New Netplan with BOND (works without KVM Host selected, as per first screenshot above):

Revision history for this message
Thiago Martins (martinx) wrote :

The cloud-init.log with BOND (failed deployment):

Revision history for this message
Thiago Martins (martinx) wrote :

The cloud-init.log with BOND (failed deployment):

Revision history for this message
Thiago Martins (martinx) wrote :

The cloud-init.log with BOND (failed deployment):

Revision history for this message
Thiago Martins (martinx) wrote :

The cloud-init-output.log (failed deployment):

Changed in maas:
milestone: 2.5.0rc2 → 2.6.0
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.