Broken pxe-e1000.bin

Bug #617316 reported by rfehren
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
etherboot (Fedora)
Fix Released
Medium
etherboot (Ubuntu)
Fix Released
Medium
Serge Hallyn
Nominated for Lucid by Serge Hallyn

Bug Description

Binary package hint: etherboot

This affects package kvm-pxe (5.4.4-1ubuntu1) in lucid and is already mentioned in Bug #570870.

Trying a KVM PXE boot with model=e1000
results in the following error message:

Searching for server (DHCP)....No IP address

The message is repeated infinitely. Using a different nic model, an IP address is obtained, and booting works.

The Debian unstable ROM (e1000-82540em.rom) in etherboot-qemu_5.4.4-6_all.deb works. And the fix is decribed
in https://bugzilla.redhat.com/show_bug.cgi?id=507391

SRU Information:

Impact: users cannot pxe-boot KVM VMs.
Fix: The bug was addressed by including a patch from Fedora, which
      marks a struct as 'volatile'. The fix can be seen in
      https://code.launchpad.net/~serge-hallyn/ubuntu/maverick/etherboot/e1000fix
Test Case: Set up a KVM vm, and try to pxe-boot.
Regression Potential: This minimal patch is included in the maverick
      source as well as fedora's, and appears unlikely to have any
      other effects, solar spots notwithstanding.

Related branches

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

Created attachment 348934
DSL VM configuration

Description of problem:
I've upgraded my first KVM host to F11.
I'm trying to boot DSL (Damn Small Linux) using bootpxe.
This test works just fine under F9 and F10.

Version-Release number of selected component (if applicable):
qemu-0.10.4-4.fc11.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup a private bridge. (Configuration attached.)
2. Setup a qemu empty VM. (Configuration attached.)
3. Boot.

Actual results:
Client fails to receive an IP. Host sees invalid packets. (pcap attached)

Expected results:
boot.

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

Created attachment 348935
Private bridge configuration. (Bridge running in promisc mode)

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

Created attachment 348936
tap42 wireshark recording.

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

P.S. dhcp works just fine, once the OS actually boots.

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

What version of etherboot is this? Does etherboot-5.4.4-15.fc11 help?

  https://admin.fedoraproject.org/updates/etherboot-5.4.4-15.fc11

I doubt it - those frames are pretty messed up. Does it work with e.g. rtl8139, virtio, ne2k_pci or pcnet?

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

Works just fine with rtl8139 with etherboot-5.4.4-13.
I'm still getting trashed 0xff frames with etherboot-5.4.4-15.

- Gilboa

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Okay, so the packet dump shows the type field in the ethernet header is (incorrectly) zero.

Enabling debugging in etherboot-5.4.4/drivers/net/e1000.c made the problem go away, which was the first clue.

The code is as follows:

    struct eth_hdr {
        unsigned char dst_addr[ETH_ALEN];
 unsigned char src_addr[ETH_ALEN];
        unsigned short type;
    } hdr;
    ...
    hdr.type = htons (type);
    txhd = tx_base + tx_tail;
    tx_tail = (tx_tail + 1) % 8;
    ...
    txhd->buffer_addr = virt_to_bus (&hdr);
    ...
    E1000_WRITE_REG (&hw, TDT, tx_tail);

i.e. we're setting the type in the header on the stack, setting up a tx descriptor to point to header on the stack and then writing the descriptor number to the device queue.

Looking at the assembly, I see:

     36d: 8b 4c 24 38 mov 0x38(%esp),%ecx
     371: 86 cd xchg %cl,%ch
     ...
     3fb: 89 90 18 38 00 00 mov %edx,0x3818(%eax)
     ...
     407: 66 89 4c 24 1e mov %cx,0x1e(%esp)

i.e. we're only actually moving the results of the htons() into the header on the stack until after we've set the TDT register. At that point the packet has already been sent.

The problem is that the compiler has no way of knowing this memory is used as a result of us writing to the register. So, if we do:

- struct eth_hdr {
+ volatile struct eth_hdr {

we see:

     36c: 8b 44 24 38 mov 0x38(%esp),%eax
     370: 86 c4 xchg %al,%ah
     372: 66 89 44 24 1e mov %ax,0x1e(%esp)
     ...
     400: 89 90 18 38 00 00 mov %edx,0x3818(%eax)

This fixes the problem.

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

* Tue Jun 23 2009 Mark McLoughlin <email address hidden> - 5.4.4-16
- Fix e1000 PXE boot - caused by compiler optimization (bug #507391)

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

*** Bug 494541 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

etherboot-5.4.4-16.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/etherboot-5.4.4-16.fc11

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

etherboot-5.4.4-16.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with
 su -c 'yum --enablerepo=updates-testing update etherboot'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7024

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

etherboot-5.4.4-16.fc11.noarch seems to solve the problem.

- Gilboa

Revision history for this message
In , Kari (kari-redhat-bugs) wrote :

etherboot-5.4.4-16.fc11 works for me also and solves no IP problem (bug #494541)

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Gilboa and Kari, thanks for testing - I'll push to stable now

Note, in future, if you go to the update url:

  https://admin.fedoraproject.org/updates/F11/FEDORA-2009-7024

you can login and add a comment - this increases the update's 'karma'; if enough people comment, the update gets pushed automatically

Revision history for this message
In , Gilboa (gilboa-redhat-bugs) wrote :

Thanks. Will do.

- Gilboa

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

etherboot-5.4.4-16.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

The redhat bugs suggest PXE_DHCP_STRICT fixed it for them. Can you
try the etherboot package in ppa:serge-hallyn/test ?

(add-apt-repository ppa:serge-hallyn/test; apt-get update; apt-get upgrade)

Changed in etherboot (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Serge Hallyn (serge-hallyn)
status: New → Incomplete
Revision history for this message
rfehren (rf) wrote :

Hmm, this doesn't fix it.

https://bugzilla.redhat.com/show_bug.cgi?id=507391 says it was an issue with compiler optimization. You might also check how it was fixed in the above mentioned Debian package.

Changed in etherboot (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Could you try the package in ppa:serge-hallyn/etherboot-e1000 ?

Revision history for this message
rfehren (rf) wrote : [Bug 617316] Re: Broken pxe-e1000.bin

>>>>> "Serge" == Serge Hallyn <email address hidden> writes:

    Serge> Could you try the package in ppa:serge-hallyn/etherboot-e1000?

There is no new version there:

# apt-cache policy kvm-pxe
kvm-pxe:
  Installed: 5.4.4-1ubuntu2
  Candidate: 5.4.4-1ubuntu2
  Version table:
 *** 5.4.4-1ubuntu2 0
        500 http://ppa.launchpad.net/serge-hallyn/etherboot-e1000/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status
     5.4.4-1ubuntu1 0
        500 http://de.archive.ubuntu.com/ubuntu/ lucid/universe Packages

Does your new package work for you?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Sorry, I'd forgotten I'd already asked you to test one before. Could you

 rm /etc/apt/sources.list.d/serge-hallyn-test*
 apt-get update
 dpkg -r etherboot
 apt-get install etherboot

If that still doesn't work, then I'll dput a package with bumped
version number.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(I've gone ahead and also pushed a package with bumped version
number, but it can take the build system awhile to get around to it,
so if you're ready to test immediately the steps above will still be useful)

Revision history for this message
rfehren (rf) wrote :

>>>>> "Serge" == Serge Hallyn <email address hidden> writes:

    Serge> Sorry, I'd forgotten I'd already asked you to test one
    Serge> before. Could you

    Serge> rm /etc/apt/sources.list.d/serge-hallyn-test* apt-get update ...

OK. The new version works for me.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks rfehren - went ahead and proposed
https://code.launchpad.net/~serge-hallyn/ubuntu/maverick/etherboot/e1000fix
for merge.

Changed in etherboot (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package etherboot - 5.4.4-1ubuntu3

---------------
etherboot (5.4.4-1ubuntu3) maverick; urgency=low

  [ Serge Hallyn ]
  * Add patch etherboot-5.4.4-e1000-tx-volatile.patch from Fedora
    Taken from https://bugzilla.redhat.com/show_bug.cgi?id=507391
    (LP: #617316)

  [ Stefano Rivera ]
  * Build-Depend on syslinux instead of syslinux-common | syslinux. We need
    /usr/bin/syslinux during build.
 -- Serge Hallyn <email address hidden> Fri, 20 Aug 2010 17:40:16 -0500

Changed in etherboot (Ubuntu):
status: Fix Committed → Fix Released
description: updated
Revision history for this message
Sergey Svishchev (svs) wrote :

I seem to recall that virtio ROM has the same problem, didn't verify that yet

Changed in etherboot (Fedora):
importance: Unknown → Medium
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.