MAAS

Bug #1768870
Comment #16

Comment 16 for bug 1768870

Revision history for this message

Jason Hobbs (jason-hobbs) wrote on 2018-05-04: Re: [Bug 1768870] Re: node failed commissioning - HTTP Error 400: {'boot_interface': ["Must be one of the node's interfaces."]}

#16

How could a machine that's PXE booted possibly have no interfaces?
There is something wrong in how that's being determined.

On Fri, May 4, 2018 at 2:32 AM, Andres Rodriguez
<email address hidden> wrote:
> So I've digged through the dump and and found a few interesting things:
>
> 1. A 'Failed Commissioning' machine (landscapeamqp-2) failed because one
> script took longer to run. Judging from the rsyslog, it is not obvious
> why it would have been the case because the rsyslog truncates at a
> certain point which doesn't show the whole commissioning process it
> would have normally followed to get to the point it got. This to me
> implies that there could have been some sort of network issue.
>
> 2. Other machines in 'Commissioning' state have NO interfaces. Since
> these machines come from a Pod, these machines should have had one
> interface attached to them but it didn't. Since we see the error in
> comment #3, I think what could be happening is that there could be VM's
> with duplicated MAC addresses across the different pods.
>
> As such, Jason:
>
> 1. would imply a network issue, so I would recommend you explore the possibility there was some network breakage at some point.
> 2. We need logs from libvirt. Could you please start gathering /var/log/libvirt/ ?
>
> ** Changed in: maas
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1768870
>
> Title:
> node failed commissioning - HTTP Error 400: {'boot_interface': ["Must
> be one of the node's interfaces."]}
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> Several pod VMs failed to commission on a deploy of FCB.
>
> In rsyslog output, I see errors like this:
>
> May 3 06:29:00 nagios-1 cloud-init[1057]: request to
> http://10.244.40.33/MAAS/metadata//2012-03-01/ failed. sleeping 1.:
> HTTP Error 400: BAD REQUEST
>
> In regiond.log, there is this traceback that appears to be associated
> with the error:
>
> http://paste.ubuntu.com/p/3CBVCfRVzG/
>
> Some other VMs in this deploy successfully commissioned.
>
> This is with 2.3.3-6492-ge999a54-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1768870/+subscriptions

How could a machine that's PXE booted possibly have no interfaces?
There is something wrong in how that's being determined.

On Fri, May 4, 2018 at 2:32 AM, Andres Rodriguez
<andreserl@ubuntu-pe.org> wrote:
> So I've digged through the dump and and found a few interesting things:
>
> 1. A 'Failed Commissioning' machine (landscapeamqp-2) failed because one
> script took longer to run. Judging from the rsyslog, it is not obvious
> why it would have been the case because the rsyslog truncates at a
> certain point which doesn't show the whole commissioning process it
> would have normally followed to get to the point it got. This to me
> implies that there could have been some sort of network issue.
>
> 2. Other machines in 'Commissioning' state have NO interfaces. Since
> these machines come from a Pod, these machines should have had one
> interface attached to them but it didn't. Since we see the error in
> comment #3, I think what could be happening is that there could be VM's
> with duplicated MAC addresses across the different pods.
>
> As such, Jason:
>
> 1. would imply a network issue, so I would recommend you explore the possibility there was some network breakage at some point.
> 2. We need logs from libvirt. Could you please start gathering /var/log/libvirt/  ?
>
> ** Changed in: maas
>        Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1768870
>
> Title:
>   node failed commissioning - HTTP Error 400: {'boot_interface': ["Must
>   be one of the node's interfaces."]}
>
> Status in MAAS:
>   Incomplete
>
> Bug description:
>   Several pod VMs failed to commission on a deploy of FCB.
>
>   In rsyslog output, I see errors like this:
>
>   May  3 06:29:00 nagios-1 cloud-init[1057]: request to
>   http://10.244.40.33/MAAS/metadata//2012-03-01/ failed. sleeping 1.:
>   HTTP Error 400: BAD REQUEST
>
>   In regiond.log, there is this traceback that appears to be associated
>   with the error:
>
>   http://paste.ubuntu.com/p/3CBVCfRVzG/
>
>   Some other VMs in this deploy successfully commissioned.
>
>   This is with 2.3.3-6492-ge999a54-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1768870/+subscriptions