How could a machine that's PXE booted possibly have no interfaces?
There is something wrong in how that's being determined.
On Fri, May 4, 2018 at 2:32 AM, Andres Rodriguez
<email address hidden> wrote:
> So I've digged through the dump and and found a few interesting things:
>
> 1. A 'Failed Commissioning' machine (landscapeamqp-2) failed because one
> script took longer to run. Judging from the rsyslog, it is not obvious
> why it would have been the case because the rsyslog truncates at a
> certain point which doesn't show the whole commissioning process it
> would have normally followed to get to the point it got. This to me
> implies that there could have been some sort of network issue.
>
> 2. Other machines in 'Commissioning' state have NO interfaces. Since
> these machines come from a Pod, these machines should have had one
> interface attached to them but it didn't. Since we see the error in
> comment #3, I think what could be happening is that there could be VM's
> with duplicated MAC addresses across the different pods.
>
> As such, Jason:
>
> 1. would imply a network issue, so I would recommend you explore the possibility there was some network breakage at some point.
> 2. We need logs from libvirt. Could you please start gathering /var/log/libvirt/ ?
>
> ** Changed in: maas
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1768870
>
> Title:
> node failed commissioning - HTTP Error 400: {'boot_interface': ["Must
> be one of the node's interfaces."]}
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> Several pod VMs failed to commission on a deploy of FCB.
>
> In rsyslog output, I see errors like this:
>
> May 3 06:29:00 nagios-1 cloud-init[1057]: request to
> http://10.244.40.33/MAAS/metadata//2012-03-01/ failed. sleeping 1.:
> HTTP Error 400: BAD REQUEST
>
> In regiond.log, there is this traceback that appears to be associated
> with the error:
>
> http://paste.ubuntu.com/p/3CBVCfRVzG/
>
> Some other VMs in this deploy successfully commissioned.
>
> This is with 2.3.3-6492-ge999a54-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1768870/+subscriptions
How could a machine that's PXE booted possibly have no interfaces?
There is something wrong in how that's being determined.
On Fri, May 4, 2018 at 2:32 AM, Andres Rodriguez /bugs.launchpad .net/bugs/ 1768870 10.244. 40.33/MAAS/ metadata/ /2012-03- 01/ failed. sleeping 1.: paste.ubuntu. com/p/3CBVCfRVz G/ ge999a54- 0ubuntu1~ 16.04.1. /bugs.launchpad .net/maas/ +bug/1768870/ +subscriptions
<email address hidden> wrote:
> So I've digged through the dump and and found a few interesting things:
>
> 1. A 'Failed Commissioning' machine (landscapeamqp-2) failed because one
> script took longer to run. Judging from the rsyslog, it is not obvious
> why it would have been the case because the rsyslog truncates at a
> certain point which doesn't show the whole commissioning process it
> would have normally followed to get to the point it got. This to me
> implies that there could have been some sort of network issue.
>
> 2. Other machines in 'Commissioning' state have NO interfaces. Since
> these machines come from a Pod, these machines should have had one
> interface attached to them but it didn't. Since we see the error in
> comment #3, I think what could be happening is that there could be VM's
> with duplicated MAC addresses across the different pods.
>
> As such, Jason:
>
> 1. would imply a network issue, so I would recommend you explore the possibility there was some network breakage at some point.
> 2. We need logs from libvirt. Could you please start gathering /var/log/libvirt/ ?
>
> ** Changed in: maas
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> node failed commissioning - HTTP Error 400: {'boot_interface': ["Must
> be one of the node's interfaces."]}
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> Several pod VMs failed to commission on a deploy of FCB.
>
> In rsyslog output, I see errors like this:
>
> May 3 06:29:00 nagios-1 cloud-init[1057]: request to
> http://
> HTTP Error 400: BAD REQUEST
>
> In regiond.log, there is this traceback that appears to be associated
> with the error:
>
> http://
>
> Some other VMs in this deploy successfully commissioned.
>
> This is with 2.3.3-6492-
>
> To manage notifications about this bug go to:
> https:/