Comment 22 for bug 1367482

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1367482] Re: virtual nodes don't always PXE boot on the same NIC

On Tue, Dec 16, 2014 at 11:08 AM, David Britton <<email address hidden>
> wrote:

> On Tue, Dec 16, 2014 at 04:50:18PM -0000, Ryan Harper wrote:
> > I spent quite a bit of time to see if there was something to do. The
> > current state of pxe control in QEMU isn't ideal. You can disable
> > loading of the option rom which prevents any nic of that *type* from pxe
> > booting. However, there is no control over a per-nic basis.
>
> @rharper -- didn't you also say something about a bridge timeout?
>

Yes, I've not done the benchmarking to determine if this resolves the nic
timeout with MaaS,
but here's some info on bridge forward_delay, defaults and how to modify.

The virbr0 bridge on systems with libvirt installed set the forward_delay
value to 2.0 (seconds).

% brctl showstp virbr0
virbr0
 bridge id 8000.000000000000
 designated root 8000.000000000000
 root port 0 path cost 0
 max age 20.00 bridge max age 20.00
 hello time 2.00 bridge hello time 2.00
 forward delay 2.00 bridge forward delay 2.00
 ageing time 300.00
 hello timer 0.73 tcn timer 0.00
 topology change timer 0.00 gc timer 243.26
 flags

However, the default for bridges are much higher:

% sudo brctl addbr testbr
(foudres) ~ % sudo brctl showstp testbr
testbr
 bridge id 8000.000000000000
 designated root 8000.000000000000
 root port 0 path cost 0
 max age 20.00 bridge max age 20.00
 hello time 2.00 bridge hello time 2.00
 forward delay 15.00 bridge forward delay 15.00
 ageing time 300.00
 hello timer 0.00 tcn timer 0.00
 topology change timer 0.00 gc timer 0.00
 flags

You can lower this value to 2 or 0. For reference, fowarding delay[1]
is the time spent in each of the Listening and Learning states before
the Forwarding state is entered. This delay is so that when a new bridge
comes onto a busy network it looks at some traffic before participating.

% sudo brctl setfd testbr 0
(foudres) ~ % sudo brctl showstp testbr
testbr
 bridge id 8000.000000000000
 designated root 8000.000000000000
 root port 0 path cost 0
 max age 20.00 bridge max age 20.00
 hello time 2.00 bridge hello time 2.00
 forward delay 0.00 bridge forward delay 0.00
 ageing time 300.00
 hello timer 0.00 tcn timer 0.00
 topology change timer 0.00 gc timer 0.00
 flags

If the bridge the KVM VM is on is also on public networks, it's possible
that lowering this value could cause issues[2].

1.
http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge#Forwarding_delay
2.
http://www.microhowto.info/troubleshooting/troubleshooting_ethernet_bridging_on_linux.html#idp212224

> >
> > The changes included switch the first nic to use virtio (instead of
> > rtl8139, unrelated but a better, faster choice), and then switch the
> > second nic to be e1000, and then include the <rom file=''> directive
> > which disables pxe booting on all e1000 nics. The VM now will pxe boot
> > only from the virtio nic. If the bridge is slow or maas is busy, then
> > the VM may not successfully pxe boot. This may or may not be more
> > desirable from an Orange Box perspective.
> >
>
> It should put us into a 'fast-fail' sitation, and remove a
> false-positive (i.e., the node looks fine in MAAS, but is not reachable
> via name or IP listed in maas).
>
> If we are using juju -- the problem gets detected as a timeout
> eventually (also not ideal). After switching to this, it would get
> detected earlier as a 409, and a 'failed deployment' in the MAAS GUI.
>
> I think all-in-all it's a good change for the orange box as it will
> allow us to more precisely detect when the problem occurrs.
>
> Thanks for this nice write-up, btw.
>
> --
> David Britton <email address hidden>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1367482
>
> Title:
> virtual nodes don't always PXE boot on the same NIC
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libvirt/+bug/1367482/+subscriptions
>