STP enabled on bridge results in unreliable PXE boot of guests

Bug #924446 reported by Adam Gandelman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Unassigned

Bug Description

When STP is enabled on a bridge interface used on a libvirt network, DHCP requests fail 75% of the time during PXE boot. If I drop to the iPXE prompt, and send a few more dhcp requests, I'll often get a reply by the 3rd or 4th attempt. I've verified via dnsmasq logging + tcpdump that the requests are never actually making it to the server. Disabling STP on the corresponding bridge fixes.

Serge Hallyn seems to think its due to upstream commit @ https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=605953

libvirt-bin 0.9.8-2ubuntu7
qemu-kvm 1.0+noroms-0ubuntu4
kvm-pxe 5.4.5

<network>
  <name>cobbler-devnet</name>
  <uuid>7976ddf2-4e92-e15a-7b4d-2948542a3349</uuid>
  <forward mode='nat'/>
  <bridge name='virbr1' stp='on' delay='0' />
  <mac address='52:54:00:C2:5D:94'/>
  <ip address='192.168.123.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.123.2' end='192.168.123.254' />
      <host mac='00:16:3e:3e:aa:02' name='node02' ip='192.168.123.102' />
      <host mac='00:16:3e:3e:aa:03' name='node03' ip='192.168.123.103' />
      <host mac='00:16:3e:3e:aa:01' name='node01' ip='192.168.123.101' />
      <host mac='00:16:3e:3e:a9:1a' name='cobbler' ip='192.168.123.2' />
      <bootp file='pxelinux.0' server='192.168.123.2' />
    </dhcp>
  </ip>
</network>

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Marking confirmed as I've seen other mention of this.

Your bridge is virbr1. What does virbr0 look like? Can you show your full network in context (/etc/network/interfaces, brctl show, ifconfig -a, netstat -nr)?

Changed in libvirt (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Note that Adam has confirmed that changing to stp off fixes the pxe boot problem for him.

I'm tempted to make 'stp off' explicitly the default, just for the default virbr0 we ship.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

So the issue is not STP itself, but the forward delay set on the bridges created by libvirt. Libvirt on oneiric creates bridges with STP enabled but a FD of 0. Precise has STP enabled but a FD 15s. With STP enabled, guests could obtain an IP during PXE/DHCP on the first try with a FD minimum of 8. I'd be hesitant to turn STP off by default, but perhaps either lower/remove the default FD if its safe OR document in the server guide/elsewhere that PXE booting KVM guests requires a minimal FD. At this point I'm unsure whether the default FD change has happened in libvirt, bridge-utils or elsewhere (the previously linked bugzilla+commit mentions libvirt defaults to 0)

Another relevant bugzilla ticket: https://bugzilla.redhat.com/show_bug.cgi?id=533684

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Adam. It looks like this is actually set by the kernel. See commit bb900b27a2f49b37bc38c08e656ea13048fee13b.

So it looks like we should simply specify a lower default timeout in the default network xml we ship. What do you think would be a good timeout?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Looks liek this was fixed upstream, by commit 2d5046d31f4f5c961fc4aa6b415a00bb9eadae2b. I'll cherrypick the fix.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.9.8-2ubuntu8

---------------
libvirt (0.9.8-2ubuntu8) precise; urgency=low

  * ubuntu/fix-bridge-fd.patch: cherrypick commit
    2d5046d31f4f5c961fc4aa6b415a00bb9eadae2b from upstream to write the
    bridge delay to the right file. (LP: #924446)
 -- Serge Hallyn <email address hidden> Wed, 01 Feb 2012 11:13:23 -0600

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.