MAAS

iPXE ignores vlan 0 traffic

Bug #1805920 reported by Vern Hart on 2018-11-30

This bug affects 1 person

	Status	Importance	Assigned to
MAAS	Invalid	Undecided	Unassigned
ipxe (Ubuntu)	Fix Released	Undecided	Unassigned
Trusty	Won't Fix	Undecided	Unassigned
Xenial	Won't Fix	Undecided	Unassigned
Bionic	Fix Released	Undecided	Unassigned
Cosmic	Fix Released	Undecided	Unassigned
Disco	Fix Released	Undecided	Unassigned
ipxe-qemu-256k-compat (Ubuntu)	Invalid	Undecided	Unassigned
linux (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

[Impact]

* VLAN 0 is special (for QoS actually, not a real VLAN)
* Some components in the stack accidentally strip it, so does ipxe in
   this case.
* Fix by porting a fix that is carried by other distributions as upstream
   didn't follow the suggestion but it is needed for the use case affected
   by the bug here (Thanks Andres)

[Test Case]

* Comment #42 contains a virtual test setup to understand the case but it
   does NOT trigger the isse. That requires special switch HW that adds
   VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a
   customer site with such hardware being affected by this issue.

[Regression Potential]

* The only reference to VLAN tags on iPXE boot that we found was on iBFT
   boot for SCSI, we tested that in comment #34 and it still worked fine.
* We didn't see such cases on review, but there might be use cases that
   made some unexpected use of the headers which are now stripped. But
   that seems wrong.

[Other Info]

* n/a

---

I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot.

While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface.

Here is the dump when the PXE boot fails (no dhcp server on infra2):
https://pastebin.canonical.com/p/THW2gTSv4S/

Here is the dump when PXE boot succeeds (when infra2 is serving dhcp):
https://pastebin.canonical.com/p/HH3XvZtTGG/

The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging.

Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged.

The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes.

I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread:

http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html
http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html

Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly.

See original description

Tags:

Related branches

~paelzer/ubuntu/+source/ipxe:merge-1.21-upstream-IMPISH

Merged into ubuntu/+source/ipxe:ubuntu/impish-devel at revision 4d3eeea1e970cfe63960c8ee758fe6d74188c08b

Lucas Kanashiro (community): Approve on 2021-05-05

Canonical Server: Pending requested 2021-05-05

Canonical Server packageset reviewers: Pending requested 2021-05-05

Diff: 71595 lines (+51215/-3955)

402 files modified

.github/workflows/build.yml (+71/-0)
.github/workflows/coverity.yml (+37/-0)
contrib/cloud/aws-import (+100/-0)
contrib/coverity/model.c (+8/-0)
contrib/vm/bochsrc.txt (+542/-368)
debian/changelog (+15/-0)
debian/copyright (+126/-3)
debian/patches/0005-strip-802.1Q-VLAN-0-priority-tags.patch (+44/-22)
debian/patches/series (+0/-3)
debian/rules (+0/-1)
dev/null (+0/-157)
src/Makefile (+30/-3)
src/Makefile.efi (+12/-4)
src/Makefile.housekeeping (+134/-111)
src/Makefile.linux (+51/-0)
src/arch/arm/include/ipxe/arm_io.h (+59/-18)
src/arch/i386/Makefile (+0/-16)
src/arch/i386/Makefile.linux (+9/-1)
src/arch/i386/include/bits/compiler.h (+1/-1)
src/arch/x86/Makefile.linux (+7/-10)
src/arch/x86/Makefile.pcbios (+8/-46)
src/arch/x86/core/pcidirect.c (+1/-0)
src/arch/x86/core/runtime.c (+5/-42)
src/arch/x86/core/stack.S (+1/-1)
src/arch/x86/core/stack16.S (+1/-1)
src/arch/x86/core/x86_string.c (+14/-0)
src/arch/x86/drivers/hyperv/hyperv.c (+6/-6)
src/arch/x86/drivers/xen/hvm.c (+6/-4)
src/arch/x86/drivers/xen/hvm.h (+2/-0)
src/arch/x86/image/com32.c (+1/-1)
src/arch/x86/image/initrd.c (+6/-6)
src/arch/x86/include/bits/bigint.h (+8/-17)
src/arch/x86/include/bits/bitops.h (+4/-4)
src/arch/x86/include/ipxe/cpuid.h (+3/-0)
src/arch/x86/include/ipxe/pcibios.h (+13/-0)
src/arch/x86/include/ipxe/pcidirect.h (+15/-2)
src/arch/x86/include/ipxe/rsdp.h (+13/-0)
src/arch/x86/interface/pcbios/bios_cachedhcp.c (+76/-0)
src/arch/x86/interface/pcbios/bios_console.c (+1/-1)
src/arch/x86/interface/pcbios/e820mangler.S (+1/-1)
src/arch/x86/interface/pcbios/int13.c (+3/-3)
src/arch/x86/interface/pcbios/memtop_umalloc.c (+4/-4)
src/arch/x86/interface/pcbios/pcibios.c (+1/-0)
src/arch/x86/interface/pcbios/rsdp.c (+1/-0)
src/arch/x86/interface/pxe/pxe_entry.S (+2/-2)
src/arch/x86/interface/syslinux/comboot_call.c (+3/-3)
src/arch/x86/prefix/exeprefix.S (+1/-1)
src/arch/x86/prefix/mromprefix.S (+1/-1)
src/arch/x86/prefix/rawprefix.S (+53/-0)
src/arch/x86/prefix/romprefix.S (+4/-4)
src/arch/x86/prefix/unlzma.S (+1/-1)
src/arch/x86/prefix/usbdisk.S (+54/-10)
src/arch/x86/scripts/pcbios.lds (+49/-14)
src/arch/x86/scripts/prefixonly.lds (+29/-0)
src/arch/x86/transitions/liba20.S (+1/-1)
src/arch/x86/transitions/librm.S (+20/-9)
src/arch/x86/transitions/librm_mgmt.c (+17/-1)
src/arch/x86/transitions/librm_test.c (+2/-1)
src/arch/x86_64/Makefile.linux (+6/-2)
src/arch/x86_64/include/bits/compiler.h (+1/-1)
src/config/cloud/console.h (+5/-0)
src/config/cloud/general.h (+9/-0)
src/config/cloud/ioapi.h (+7/-0)
src/config/config.c (+3/-0)
src/config/config_crypto.c (+50/-0)
src/config/config_ethernet.c (+3/-0)
src/config/config_fdt.c (+41/-0)
src/config/config_usb.c (+3/-0)
src/config/crypto.h (+16/-13)
src/config/defaults/efi.h (+6/-0)
src/config/defaults/linux.h (+2/-0)
src/config/defaults/pcbios.h (+2/-0)
src/config/dhcp.h (+1/-1)
src/config/fdt.h (+16/-0)
src/config/general.h (+3/-1)
src/config/ioapi.h (+3/-0)
src/config/qemu/ioapi.h (+0/-0)
src/config/rpi/colour.h (+0/-0)
src/config/rpi/console.h (+0/-0)
src/config/rpi/crypto.h (+0/-0)
src/config/rpi/general.h (+0/-0)
src/config/rpi/ioapi.h (+0/-0)
src/config/rpi/serial.h (+0/-0)
src/config/rpi/settings.h (+0/-0)
src/config/rpi/sideband.h (+0/-0)
src/config/rpi/usb.h (+13/-0)
src/config/usb.h (+1/-0)
src/config/vbox/ioapi.h (+0/-0)
src/core/acpi.c (+15/-20)
src/core/blocktrans.c (+1/-3)
src/core/cachedhcp.c (+16/-37)
src/core/dma.c (+179/-0)
src/core/fdt.c (+486/-0)
src/core/image.c (+68/-0)
src/core/interface.c (+22/-0)
src/core/iobuf.c (+51/-6)
src/core/malloc.c (+2/-2)
src/core/null_acpi.c (+1/-1)
src/core/open.c (+1/-3)
src/core/parseopt.c (+1/-1)
src/core/settings.c (+7/-2)
src/core/string.c (+21/-6)
src/core/uri.c (+2/-2)
src/crypto/bigint.c (+29/-0)
src/crypto/certstore.c (+2/-2)
src/crypto/cms.c (+1/-1)
src/crypto/deflate.c (+1/-1)
src/crypto/md4.c (+0/-11)
src/crypto/md5.c (+0/-11)
src/crypto/mishmash/oid_md4.c (+37/-0)
src/crypto/mishmash/oid_md5.c (+37/-0)
src/crypto/mishmash/oid_rsa.c (+38/-0)
src/crypto/mishmash/oid_sha1.c (+37/-0)
src/crypto/mishmash/oid_sha224.c (+37/-0)
src/crypto/mishmash/oid_sha256.c (+37/-0)
src/crypto/mishmash/oid_sha384.c (+37/-0)
src/crypto/mishmash/oid_sha512.c (+37/-0)
src/crypto/mishmash/oid_sha512_224.c (+37/-0)
src/crypto/mishmash/oid_sha512_256.c (+37/-0)
src/crypto/mishmash/rsa_md5.c (+1/-1)
src/crypto/mishmash/rsa_sha1.c (+1/-1)
src/crypto/mishmash/rsa_sha224.c (+1/-1)
src/crypto/mishmash/rsa_sha256.c (+1/-1)
src/crypto/mishmash/rsa_sha384.c (+1/-1)
src/crypto/mishmash/rsa_sha512.c (+1/-1)
src/crypto/ocsp.c (+46/-30)
src/crypto/privkey.c (+26/-9)
src/crypto/rootcert.c (+1/-0)
src/crypto/rsa.c (+6/-11)
src/crypto/sha1.c (+0/-11)
src/crypto/sha224.c (+0/-11)
src/crypto/sha256.c (+0/-11)
src/crypto/sha384.c (+0/-11)
src/crypto/sha512.c (+0/-11)
src/crypto/sha512_224.c (+0/-11)
src/crypto/sha512_256.c (+0/-11)
src/crypto/x509.c (+72/-23)
src/drivers/block/ibft.c (+6/-2)
src/drivers/bus/isa.c (+2/-1)
src/drivers/bus/pci.c (+25/-2)
src/drivers/bus/pcimsix.c (+251/-0)
src/drivers/bus/usb.c (+66/-38)
src/drivers/bus/virtio-pci.c (+1/-1)
src/drivers/infiniband/MT25218_PRM.h (+88/-88)
src/drivers/infiniband/MT25408_PRM.h (+132/-132)
src/drivers/infiniband/arbel.c (+41/-37)
src/drivers/infiniband/flexboot_nodnic.c (+8/-5)
src/drivers/infiniband/golan.c (+39/-32)
src/drivers/infiniband/hermon.c (+327/-97)
src/drivers/infiniband/hermon.h (+34/-3)
src/drivers/infiniband/linda.c (+2430/-0)
src/drivers/infiniband/linda.h (+281/-0)
src/drivers/infiniband/linda_fw.c (+1069/-0)
src/drivers/infiniband/mlx_utils_flexboot/src/mlx_memory_priv.c (+2/-2)
src/drivers/infiniband/mlx_utils_flexboot/src/mlx_pci_priv.c (+1/-1)
src/drivers/infiniband/nodnic_prm.h (+2/-2)
src/drivers/infiniband/qib7322.c (+10/-9)
src/drivers/linux/af_packet.c (+1/-1)
src/drivers/linux/linux.c (+37/-13)
src/drivers/linux/slirp.c (+552/-0)
src/drivers/linux/tap.c (+1/-1)
src/drivers/net/3c90x.c (+4/-4)
src/drivers/net/amd8111e.c (+1/-1)
src/drivers/net/ath/ath5k/ath5k.c (+4/-4)
src/drivers/net/ath/ath5k/ath5k_eeprom.c (+1/-0)
src/drivers/net/ath/ath9k/ath9k.c (+1/-1)
src/drivers/net/ath/ath9k/ath9k_init.c (+3/-3)
src/drivers/net/atl1e.c (+2/-2)
src/drivers/net/axge.c (+45/-22)
src/drivers/net/axge.h (+5/-0)
src/drivers/net/b44.c (+7/-7)
src/drivers/net/bnx2.c (+2694/-0)
src/drivers/net/bnx2.h (+4598/-0)
src/drivers/net/bnx2_fw.h (+3494/-0)
src/drivers/net/bnxt/bnxt.c (+2170/-0)
src/drivers/net/bnxt/bnxt.h (+1006/-0)
src/drivers/net/bnxt/bnxt_dbg.h (+677/-0)
src/drivers/net/bnxt/bnxt_hsi.h (+10337/-0)
src/drivers/net/eepro100.c (+9/-9)
src/drivers/net/efi/nii.c (+30/-12)
src/drivers/net/efi/snpnet.c (+54/-8)
src/drivers/net/ena.c (+45/-21)
src/drivers/net/ena.h (+1/-1)
src/drivers/net/etherfabric.c (+3/-3)
src/drivers/net/exanic.c (+5/-5)
src/drivers/net/forcedeth.c (+3/-3)
src/drivers/net/icplus.c (+4/-4)
src/drivers/net/igbvf/igbvf_main.c (+6/-6)
src/drivers/net/intel.c (+35/-24)
src/drivers/net/intel.h (+5/-0)
src/drivers/net/intelvf.c (+10/-8)
src/drivers/net/intelvf.h (+7/-1)
src/drivers/net/intelx.c (+8/-2)
src/drivers/net/intelxl.c (+581/-255)
src/drivers/net/intelxl.h (+363/-36)
src/drivers/net/intelxlvf.c (+728/-0)
src/drivers/net/intelxlvf.h (+86/-0)
src/drivers/net/intelxvf.c (+7/-1)
src/drivers/net/iphone.c (+2268/-0)
src/drivers/net/iphone.h (+291/-0)
src/drivers/net/jme.c (+5/-5)
src/drivers/net/lan78xx.c (+12/-0)
src/drivers/net/lan78xx.h (+6/-0)
src/drivers/net/myri10ge.c (+4/-4)
src/drivers/net/myson.c (+4/-4)
src/drivers/net/natsemi.c (+4/-4)
src/drivers/net/ncm.c (+8/-5)
src/drivers/net/netfront.c (+138/-56)
src/drivers/net/netfront.h (+15/-1)
src/drivers/net/pcnet32.c (+4/-4)
src/drivers/net/phantom/phantom.c (+9/-9)
src/drivers/net/prism2_pci.c (+1/-1)
src/drivers/net/realtek.c (+63/-72)
src/drivers/net/realtek.h (+15/-3)
src/drivers/net/rhine.c (+3/-3)
src/drivers/net/rtl818x/rtl818x.c (+6/-6)
src/drivers/net/sfc/efx_common.c (+4/-3)
src/drivers/net/sfc/efx_hunt.c (+5/-4)
src/drivers/net/sfc/efx_hunt.h (+3/-2)
src/drivers/net/sfc/mcdi.h (+4/-2)
src/drivers/net/sfc/sfc_hunt.c (+5/-2)
src/drivers/net/sis190.c (+5/-5)
src/drivers/net/skeleton.c (+1/-1)
src/drivers/net/skge.c (+5/-4)
src/drivers/net/sky2.c (+8/-8)
src/drivers/net/smsc95xx.c (+4/-0)
src/drivers/net/smscusb.c (+34/-0)
src/drivers/net/smscusb.h (+1/-0)
src/drivers/net/tg3/tg3.c (+11/-11)
src/drivers/net/thunderx.c (+8/-8)
src/drivers/net/velocity.c (+8/-6)
src/drivers/net/vmxnet3.c (+8/-7)
src/drivers/net/vxge/vxge_config.c (+6/-6)
src/drivers/net/vxge/vxge_main.c (+1/-1)
src/drivers/usb/ehci.c (+17/-23)
src/drivers/usb/uhci.c (+17/-22)
src/drivers/usb/usbblk.c (+912/-0)
src/drivers/usb/usbblk.h (+121/-0)
src/drivers/usb/usbhub.c (+8/-2)
src/drivers/usb/usbio.c (+17/-4)
src/drivers/usb/xhci.c (+190/-99)
src/drivers/usb/xhci.h (+36/-9)
src/hci/commands/ifmgmt_cmd.c (+63/-1)
src/hci/commands/image_mem_cmd.c (+96/-0)
src/hci/commands/nvo_cmd.c (+43/-14)
src/hci/linux_args.c (+3/-17)
src/hci/readline.c (+14/-3)
src/hci/shell.c (+1/-1)
src/image/efi_image.c (+31/-4)
src/image/embedded.c (+3/-0)
src/image/png.c (+6/-6)
src/include/errno.h (+4/-4)
src/include/ipxe/acpi.h (+147/-1)
src/include/ipxe/aoe.h (+31/-0)
src/include/ipxe/asn1.h (+35/-4)
src/include/ipxe/cachedhcp.h (+17/-0)
src/include/ipxe/certstore.h (+2/-1)
src/include/ipxe/dma.h (+480/-0)
src/include/ipxe/eap.h (+69/-0)
src/include/ipxe/eapol.h (+37/-90)
src/include/ipxe/efi/efi.h (+74/-6)
src/include/ipxe/efi/efi_acpi.h (+13/-0)
src/include/ipxe/efi/efi_autoboot.h (+3/-1)
src/include/ipxe/efi/efi_autoexec.h (+16/-0)
src/include/ipxe/efi/efi_cachedhcp.h (+16/-0)
src/include/ipxe/efi/efi_null.h (+33/-0)
src/include/ipxe/efi/efi_path.h (+43/-0)
src/include/ipxe/efi/efi_pci.h (+10/-2)
src/include/ipxe/efi/efi_snp.h (+1/-1)
src/include/ipxe/efi/efi_usb.h (+5/-5)
src/include/ipxe/efi/efi_utils.h (+0/-4)
src/include/ipxe/efi/efi_veto.h (+13/-0)
src/include/ipxe/efi/efi_wrap.h (+1/-0)
src/include/ipxe/errfile.h (+15/-0)
src/include/ipxe/fcp.h (+8/-0)
src/include/ipxe/fdt.h (+102/-0)
src/include/ipxe/http.h (+3/-5)
src/include/ipxe/ib_srp.h (+35/-0)
src/include/ipxe/image.h (+3/-0)
src/include/ipxe/infiniband.h (+2/-1)
src/include/ipxe/interface.h (+17/-0)
src/include/ipxe/iobuf.h (+69/-0)
src/include/ipxe/linux/linux_acpi.h (+18/-0)
src/include/ipxe/linux/linux_pci.h (+24/-0)
src/include/ipxe/linux_api.h (+106/-0)
src/include/ipxe/linux_sysfs.h (+16/-0)
src/include/ipxe/malloc.h (+14/-18)
src/include/ipxe/netdevice.h (+12/-0)
src/include/ipxe/null_acpi.h (+4/-2)
src/include/ipxe/ocsp.h (+2/-2)
src/include/ipxe/open.h (+0/-2)
src/include/ipxe/pci.h (+17/-0)
src/include/ipxe/pci_io.h (+11/-0)
src/include/ipxe/pcimsix.h (+77/-0)
src/include/ipxe/peerblk.h (+24/-0)
src/include/ipxe/privkey.h (+54/-1)
src/include/ipxe/process.h (+13/-5)
src/include/ipxe/rotate.h (+26/-8)
src/include/ipxe/slirp.h (+155/-0)
src/include/ipxe/smbios.h (+48/-4)
src/include/ipxe/tables.h (+0/-71)
src/include/ipxe/tls.h (+56/-5)
src/include/ipxe/usb.h (+22/-1)
src/include/ipxe/validator.h (+2/-1)
src/include/ipxe/vlan.h (+6/-2)
src/include/ipxe/x509.h (+33/-15)
src/include/ipxe/xengrant.h (+4/-3)
src/include/readline/readline.h (+2/-1)
src/include/string.h (+2/-0)
src/include/usr/ifmgmt.h (+4/-2)
src/include/usr/imgmgmt.h (+1/-0)
src/interface/efi/efi_acpi.c (+1/-0)
src/interface/efi/efi_autoboot.c (+17/-9)
src/interface/efi/efi_autoexec.c (+206/-0)
src/interface/efi/efi_block.c (+74/-76)
src/interface/efi/efi_bofm.c (+8/-7)
src/interface/efi/efi_cachedhcp.c (+94/-0)
src/interface/efi/efi_console.c (+66/-11)
src/interface/efi/efi_debug.c (+29/-21)
src/interface/efi/efi_download.c (+4/-1)
src/interface/efi/efi_driver.c (+42/-17)
src/interface/efi/efi_entropy.c (+14/-2)
src/interface/efi/efi_fdt.c (+70/-0)
src/interface/efi/efi_init.c (+155/-0)
src/interface/efi/efi_local.c (+20/-6)
src/interface/efi/efi_null.c (+672/-0)
src/interface/efi/efi_path.c (+506/-0)
src/interface/efi/efi_pci.c (+479/-50)
src/interface/efi/efi_pxe.c (+33/-7)
src/interface/efi/efi_smbios.c (+28/-16)
src/interface/efi/efi_snp.c (+89/-89)
src/interface/efi/efi_snp_hii.c (+66/-26)
src/interface/efi/efi_timer.c (+12/-3)
src/interface/efi/efi_usb.c (+195/-124)
src/interface/efi/efi_utils.c (+4/-34)
src/interface/efi/efi_veto.c (+609/-0)
src/interface/efi/efi_wrap.c (+652/-23)
src/interface/efi/efidrvprefix.c (+10/-0)
src/interface/efi/efiprefix.c (+34/-3)
src/interface/hyperv/vmbus.c (+3/-3)
src/interface/linux/linux_acpi.c (+148/-0)
src/interface/linux/linux_api.c (+525/-0)
src/interface/linux/linux_console.c (+1/-1)
src/interface/linux/linux_entropy.c (+1/-1)
src/interface/linux/linux_nap.c (+1/-1)
src/interface/linux/linux_pci.c (+1/-1)
src/interface/linux/linux_smbios.c (+77/-65)
src/interface/linux/linux_sysfs.c (+96/-0)
src/interface/linux/linux_time.c (+1/-1)
src/interface/linux/linux_timer.c (+1/-1)
src/interface/linux/linux_umalloc.c (+1/-1)
src/interface/linux/linuxprefix.c (+17/-10)
src/interface/smbios/smbios.c (+17/-2)
src/interface/xen/xenstore.c (+6/-6)
src/libgcc/__divmoddi4.c (+1/-1)
src/net/80211/wpa.c (+17/-13)
src/net/aoe.c (+2/-29)
src/net/eap.c (+142/-0)
src/net/eapol.c (+96/-43)
src/net/eth_slow.c (+11/-0)
src/net/ethernet.c (+1/-1)
src/net/fcp.c (+20/-8)
src/net/infiniband.c (+0/-20)
src/net/infiniband/ib_sma.c (+1/-1)
src/net/infiniband/ib_srp.c (+20/-39)
src/net/ndp.c (+28/-1)
src/net/netdevice.c (+91/-14)
src/net/peerblk.c (+140/-10)
src/net/peerdisc.c (+65/-4)
src/net/peerdist.c (+38/-0)
src/net/ping.c (+2/-10)
src/net/tcp.c (+23/-10)
src/net/tcp/httpconn.c (+6/-7)
src/net/tcp/httpcore.c (+15/-0)
src/net/tcp/https.c (+13/-1)
src/net/tcp/iscsi.c (+71/-21)
src/net/tcp/syslogs.c (+14/-15)
src/net/tls.c (+448/-61)
src/net/udp.c (+2/-10)
src/net/udp/dhcp.c (+34/-13)
src/net/udp/dns.c (+131/-50)
src/net/udp/slam.c (+23/-9)
src/net/udp/tftp.c (+29/-12)
src/net/validator.c (+139/-48)
src/net/vlan.c (+46/-1)
src/scripts/efi.lds (+10/-9)
src/tests/cms_test.c (+2/-0)
src/tests/ocsp_test.c (+2/-1)
src/tests/rsa_test.c (+13/-12)
src/tests/string_test.c (+1/-0)
src/tests/x509_test.c (+7/-0)
src/usr/autoboot.c (+8/-7)
src/usr/certmgmt.c (+1/-1)
src/usr/ifmgmt.c (+13/-7)
src/usr/imgmgmt.c (+21/-0)
src/usr/imgtrust.c (+2/-1)
src/usr/lotest.c (+2/-2)
src/util/.gitignore (+0/-1)
src/util/eficompress.c (+1588/-0)
src/util/efirom.c (+64/-6)
src/util/elf2efi.c (+62/-20)
src/util/genfsimg (+306/-0)

~paelzer/ubuntu/+source/ipxe:fix-lp1805920-vlan0-tag-stripping-bionic

Merged into ubuntu/+source/ipxe:ubuntu/bionic-devel at revision 9247549eb5dba23dd79627b62a063a33b54ed21b

Andreas Hasenack: Approve on 2018-12-14

Andres Rodriguez: Pending requested 2018-12-11

Canonical Server: Pending requested 2018-12-11

git-ubuntu developers: Pending requested 2018-12-11

~paelzer/ubuntu/+source/ipxe:fix-lp1805920-vlan0-tag-stripping-cosmic

Merged into ubuntu/+source/ipxe:ubuntu/cosmic-devel at revision cc3c947c36402576d33e118a7b37a70f4758edf2

Andreas Hasenack: Approve on 2018-12-14

Andres Rodriguez: Pending requested 2018-12-11

Canonical Server: Pending requested 2018-12-11

git-ubuntu developers: Pending requested 2018-12-11

Revision history for this message

Vern Hart (vern) wrote on 2018-11-30:

failed pxe boot console Edit (17.9 KiB, image/png)

Revision history for this message

Vern Hart (vern) wrote on 2018-11-30:

Subscribed ~field-critical as this will block managed service handover.

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-11-30:

Hi Vern,

Based on your description it would sound like this is *not* a MAAS issue, but rather, it could either be a networking issue, a misconfiguration issue or a ipxe issue.

To start, from the ipxe side:

1. MAAS doesn't support PXE booting over VLANs.
2. MAAS doesn't support ipxe. We rely on the VMs on booting pxelinux emulation but MAAS doesn't pass a ipxe rom, nor configures ipxe at all.

Second, how is your network configuration? While I understand you have 3 machines running MAAS, and only VMs that have DHCP running locally can PXE boot, I'm wondering:

1. Can you confirm that all bridges of the pods on the underlying network are using a /physical/ bridge in the /same/ vlan ?
2. What's the machines interface configuration? How about the bridge configuration?
3. To what interfaces are the MAAS DHCP servers responding to? (e.g. ps faux | grep dhcpd)
4. Can you confirm that VMs inside pod on infra1 can communicate with VMs inside pod of infra2 or 3?

Lastly, I have a feeling that this is likely related to the network configuration itself, or the configuration of the bridges:
1. For networking, is STP/Portfast enabled? https://docs.maas.io/2.5/en/installconfig-network-stp
2. Does the bridge where the VMs use, configured with STP and/or a long forward-delay? https://wiki.libvirt.org/page/PXE_boot_(or_dhcp)_on_guest_failed

Changed in maas:
status:	New → Incomplete

Revision history for this message

Vern Hart (vern) wrote on 2018-12-01:

Download full text (6.9 KiB)

I agree this may not be a MAAS bug specifically but I'm not sure where else to seek assistance.

You say MAAS doesn't support PXE booting over VLANs but vlan 0 is special: https://en.wikipedia.org/wiki/IEEE_802.1Q#Frame_format
"The reserved value 0x000 indicates that the frame does not carry a VLAN ID"

Here is the relevant portion of the switch config that the customer has shared with me. The ports are configured to vlan 17 as native (untagged) and to only allow vlan 17 at all. Note that this is not vlan 0.

  interface Vethernet2424
    description server 1/3, VNIC eth0
    switchport mode trunk
    no lldp transmit
    no lldp receive
    no pinning server sticky
    pinning server pinning-failure link-down
    switchport trunk native vlan 17
    switchport trunk allowed vlan 17
    bind interface port-channel1287 channel 2424
    no shutdown

  interface Vethernet2426
    description server 1/2, VNIC eth0
    switchport mode trunk
    no lldp transmit
    no lldp receive
    no pinning server sticky
    pinning server pinning-failure link-down
    switchport trunk native vlan 17
    switchport trunk allowed vlan 17
    bind interface port-channel1286 channel 2426
    no shutdown

  interface Vethernet2428
    description server 1/1, VNIC eth0
    switchport mode trunk
    no lldp transmit
    no lldp receive
    no pinning server sticky
    pinning server pinning-failure link-down
    switchport trunk native vlan 17
    switchport trunk allowed vlan 17
    bind interface port-channel1285 channel 2428
    no shutdown

1. On all 3 MAAS nodes, the physical interface enp6s0 is the sole member of bondm which is in bride broam. The vnet interfaces of VMs show up under broam as well. The physical interfaces are not vlan tagged.
2. The netplan on each machine looks like this (with differing addresses and customer specific nameserver info):

  network:
      ethernets:
          enp6s0:
              dhcp4: false
      version: 2
      bonds:
          bondm:
              interfaces: [ enp6s0 ]
              parameters:
                  mode: active-backup
                  primary: enp6s0
      bridges:
          broam:
              addresses: [ 10.17.101.10/22 ]
              gateway4: 10.17.100.1
              interfaces: [ bondm ]
              nameservers:
                  addresses: [ 123.123.123.1, 123.123.123.2 ]
                  search: [ unicloud1.example.net ]

3. The command-line for dhcpd doesn't show an interface:

I agree this may not be a MAAS bug specifically but I'm not sure where else to seek assistance.

3. The command-line for dhcpd doesn't show an interface:

vernhart@infra1:~$ ps fuax | grep dhcpd
  vernhart 24086  0.0  0.0  13136  1100 pts/8    S+   20:25   0:00              \_ grep --color=auto dhcpd
  dhcpd     8794  0.0  0.0  45964 16976 ?        Ss   Nov29   0:11 dhcpd -user dhcpd -group dhcpd -f -q -4 -pf /run/maas/dhcp/dhcpd.pid -cf /var/lib/maas/dhcpd.conf -lf /var/lib/maas/dhcp/dhcpd.leases broam
  vernhart@infra1:~$ sudo netstat -nlp | grep dhcp
  tcp        0      0 10.17.101.10:647        0.0.0.0:*               LISTEN      8794/dhcpd          
  tcp        0      0 0.0.0.0:7911            0.0.0.0:*               LISTEN      8794/dhcpd          
  udp     5120      0 0.0.0.0:67              0.0.0.0:*                           8794/dhcpd          
  udp        0      0 0.0.0.0:7309            0.0.0.0:*                           8794/dhcpd          
  udp6       0      0 :::27481                :::*                                8794/dhcpd          
  raw        0      0 0.0.0.0:1               0.0.0.0:*               7           8794/dhcpd

4. The VMs inside all the pods can communicate with each other.

root@fce:~/fibernet-fcb# juju machines -m controller
  Machine  State    DNS           Inst id  Series  AZ       Message
  0        started  10.17.101.23  p6aaff   bionic  default  Deployed
  1        started  10.17.101.25  84gxpn   bionic  zone2    Deployed
  2        started  10.17.101.24  bqfy3m   bionic  zone3    Deployed

root@fce:~/fibernet-fcb# juju ssh -m controller 0
  Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-39-generic x86_64)

* Documentation:  https://help.ubuntu.com
   * Management:     https://landscape.canonical.com
   * Support:        https://ubuntu.com/advantage

System information as of Fri Nov 30 22:29:54 UTC 2018

System load:  0.06               Processes:           138
    Usage of /:   13.2% of 91.17GB   Users logged in:     0
    Memory usage: 2%                 IP address for ens6: 10.17.101.23
    Swap usage:   0%

Get cloud support with Ubuntu Advantage Cloud Guest:
      http://www.ubuntu.com/business/services/cloud

* Canonical Livepatch is available for installation.
     - Reduce system reboots and improve kernel security. Activate at:
       https://ubuntu.com/livepatch

33 packages can be updated.
  0 updates are security updates.

Last login: Fri Nov 16 20:17:20 2018 from 10.17.101.10
  ubuntu@juju-1:~$ ping 10.17.101.25 -c 1
  PING 10.17.101.25 (10.17.101.25) 56(84) bytes of data.
  64 bytes from 10.17.101.25: icmp_seq=1 ttl=64 time=0.534 ms

--- 10.17.101.25 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.534/0.534/0.534/0.000 ms
  ubuntu@juju-1:~$ ping 10.17.101.24 -c 1
  PING 10.17.101.24 (10.17.101.24) 56(84) bytes of data.
  64 bytes from 10.17.101.24: icmp_seq=1 ttl=64 time=0.653 ms

--- 10.17.101.24 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.653/0.653/0.653/0.000 ms
  ubuntu@juju-1:~$

Your final two asks:

1. Is STP enabled? I don't believe so but I can't find the response from the customer stating so. I will confirm. I don't think this would be an issue, however, because I can see the DHCP responses coming in a timely manner on the virtual interface when I tcpdump.

2. Is STP or a long forward-delay configured on the bridge?

vernhart@infra2:~$ brctl show
  bridge name     bridge id               STP enabled     interfaces
  broam           8000.fee6cd1cc06b       no              bondm
  vernhart@infra2:~$ brctl showstp broam
  broam
   bridge id              8000.fee6cd1cc06b
   designated root        8000.fee6cd1cc06b
   root port                 0                    path cost                  0
   max age                  20.00                 bridge max age            20.00
   hello time                2.00                 bridge hello time          2.00
   forward delay            15.00                 bridge forward delay      15.00
   ageing time             300.00
   hello timer               0.00                 tcn timer                  0.00
   topology change timer     0.00                 gc timer                 136.28
   flags

bondm (1)
   port id                8001                    state                forwarding
   designated root        8000.fee6cd1cc06b       path cost                100
   designated bridge      8000.fee6cd1cc06b       message age timer          0.00
   designated port        8001                    forward delay timer        0.00
   designated cost           0                    hold timer                 0.00
   flags

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-01:

Given that VID 0 is, as you say, a special value that is treated as DSCP-only untagged packet, VID 0 is only meaningful to network switching infrastructure, not end nodes. From what I understand, networking hardware should be configured to strip a 802.1Q tag with VID 0 before sending the packet to the end node.

In other words, this sounds like a bug or misconfiguration on the network side.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-01:

I'm not positive this will work, but maybe if you do something like this on the trunk port connected to the server, it will do what you want?

switchport trunk native vlan <vid>

This causes the switch to assume untagged traffic is on <vid>. It might also cause the tag for that VID to be stripped before it egresses the port, but there is some debate on that point, and I'm not enough of a Cisco expert to tell you the behavior for sure. ;-)

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-01:

Er, I see you've already done that, so I suppose that isn't the only variable here. I'm not sure how to tell the switch to treat that VLAN as the untagged VLAN, /and/ strip the tag when it egresses that port. That's what you need to do to get this to work, I think.

Revision history for this message

Jason Hobbs (jason-hobbs) wrote on 2018-12-01:

Why are the switchports in trunk mode, rather than access mode, when you only want to carry traffic for vlan 17 on them?

If that's the case, I think you should configure the ports as being in access mode, and setting the access vlan to 17.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-01:

I agree with Jason. I've been trying to find out if iPXE upstream has incorporated the bug fix you mentioned, or similar; so far I've found this, which is similar but not quite the same:

https://github.com/ipxe/ipxe/commit/db3443608fe32fffb4f6ad467bfc035a824bff52

The interesting part here is that perhaps the NIC driver should have stripped the tag, but didn't.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-01:

#10

Another thought: I wonder if you could use `ethtool` or similar to tell the hardware to strip off the unwanted VLAN tags.

Secondly, I found a document[1] that states that if you load the `8021q` driver into the kernel, the hardware will be automatically configured to offload tag stripping. Not sure if that would help in this case.

[1]:
https://www.intel.com/content/www/us/en/support/articles/000005498/network-and-i-o/ethernet-products.html

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-01:

#11

Just as a note see [1]. Basically a user just came across this too.

It would seem like the network admins should provide some light on this.

https://community.cisco.com/t5/unified-computing-system/cisco-ucs-nexus-802-1p-tagging-vlan0-on-traffic-between-blades/m-p/3755502

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-01:

#12

I did some digging, and it seems that:

1. Upstream has not accepted a patch from redhat to strip the priority tags [1]
2. RHEL (and consequently centos) are patching ipxe directly [2], [3].

Based on this, I think this patch could potentially be ported into ipxe in Ubuntu, however, I think there should be some investigation into the network as to why packets are not stripped on egress.

[1]: http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html
[2]: https://git.centos.org/blob/rpms!ipxe.git/c7/SPECS!ipxe.spec
[3]: https://git.centos.org/blob/rpms!ipxe.git/c7/SOURCES!0009-Strip-802.1Q-VLAN-0-priority-tags.patch

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-01:

#13

Marking Invalid for MAAS since this is unrelated to MAAS itself.

Changed in maas:
status:	Incomplete → Opinion
status:	Opinion → Invalid

Revision history for this message

Vern Hart (vern) wrote on 2018-12-03:

#14

Cisco Bug CSCuu29425 - native (untagged) packets on RHEL7 seen as tagged-VLAN 0.pdf Edit (164.0 KiB, application/pdf)

Some interesting observations. The customer deployed a pair of Centos7 machines and confirmed the vlan0 tag issue existed there as well. That wasn't too surprising.

However, they deployed a pair of Centos6 machines and they do NOT have the vlan0 tag issue.

This seems to confirm that the issue is not actually within the switch but within linux itself.

Within a thread on a community discussion on cisco.com, a Cisco employee responded saying it's a Linux bug that should already be patched. The Cisco person's response:

> You will find this behavior in all linux destro. This issue has been documented under-
> https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuu29425/?reffering_site=dumpcr
>
> You may wanna try "net.bridge.bridge-nf-filter-vlan-tagged = 1" but I haven't tested it.

That URL referenced is behind a login page so I've attached a pdf of the page from the customer.

Within that document it mentions a couple of URLS:

https://lists.openwall.net/netdev/2013/09/10/30
https://lists.linuxfoundation.org/pipermail/bridge/2015-July/009630.html

Those are very old. If they are describing the same problem/solution, then this would be a regression.

The net.bridge.bridge-nf-filter-vlan-tagged setting is for filtering vlans with iptables. I feel that is not likely the right direction.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-03:

#15

Interesting developments. I agree that it doesn't seem like `bridge-nf-filter-vlan-tagged` is what we want, unless there is a special case not filter packets tagged on VID 0. It might be worth trying this out on the bridge used to boot the pod VMs.

The frustrating thing about this bug report is the large amount of layers we would need to check for correct behavior. We are trying to verify all of the following points of contact with a "VLAN 0" tagged packet at once:

- The network infrastructure
(it would be better for VLAN 0 tags to be stripped before reaching end nodes)
- NIC hardware (some NICs handle VLAN filtering in hardware; I'm not sure if they could automatically discard the VLAN 0 tag before it's handed up the stack)
- Linux NIC driver (would be programming the aforementioned hardware to hopefully do the right thing - or not)
- Linux bridge driver (or whatever is handing the packets from the OS to the pod)
- Hypervisor NIC driver (that is, the virtual hardware which will be booting on the VM)
- iPXE (which would be using a minimal network stack prior to booting the virtual OS)
- Virtual OS Linux driver

I feel like we aren't sure which of these layers might have a problem handling packets tagged on VID 0. We know iPXE has a problem for sure; the other layers would need to be tested in isolation to verify the behavior.

Andres confirmed that RHEL/CentOS are patching iPXE to treat VLAN 0 frames as untagged (his link showed that the patch was on CentOS 7). But there are many layers here that could be handling them improperly. That is, if we know that CentOS 6 works, we can't rule out that CentOS 7 may not be working due to an issue elsewhere in the stack (not in iPXE).

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-05:

#16

Hi Vern,

Could you confirm something to us. WHen you say that the customer deployed centos 7 and were able to see the issue there, did you mean that they saw the issue in the packets themselves? In other words, CentOS7 deployed successfully but the packets still had the VLAN tag?

The reason I ask is because centos7 ipxe includes a patch that strips the vlan tag. So I'm wondering if that allows VMs on a CentOS 7 host to PXE boot without failing?

Anyhow, I've made a version of ipxe for Ubuntu with the centos patch in ppa:andreserl/maas. I've not tested it, but if you could take a look and test if that would fix the issue, then we could push that to Ubuntu.

Anyhow, we need to talk to the kernel folks to figure this one out.

Revision history for this message

Vern Hart (vern) wrote on 2018-12-05:

#17

The Centos7 deployments were to other blades in the UCS chassis and pings between the two Centos7 machines did not have vlan 0 tags.

I'll add your ppa and update to your ipxe and test pod VM booting.

Revision history for this message

Vern Hart (vern) wrote on 2018-12-05:

#18

Sorry, I got that wrong. A deployment of a pair of Centos6 hosts to the blades did not have vlan 0 tags. The Centos7 deployments did have vlan 0 tags.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-05:

#19

@vhart, can you clarify that last comment? Do you mean that the CentOS 6 VMs exhibit the issue caused by the VLAN 0 tags and the CentOS 7 tags do not? Or are you saying the tags are filtered elsewhere in the stack on CentOS 6 (such as the NIC driver, 8021q driver, or virtual bridge)?

Revision history for this message

Vern Hart (vern) wrote on 2018-12-05:

#20

I have not tried Centos6 VMs.

The customer reported that he deployed Centos7 to the baremetal blades and saw, in ping traffic that incoming packets were vlan 0 tagged.

Then he deployed Centos6 to two baremetal blades and, in a ping test, did not see any vlan tags.

If this is true, it suggests Ubuntu 18.04 and Centos 7 either add the vlan 0 tag or fail to remove it (if it's added by something else). Conversely, Centos 6 either doesn't add the tag or succeeds in removing it.

I did not ask the customer to replicate the scenario of a kvm instance PXE booting on top of the Centos deployment from a DHCP server on a different blade. And I do not have access to test that scenario myself. Another test I'd like to see the results of are ping tests between Ubuntu 18.04 and Centos 6 -- to see which side, if any, sees the vlan 0 tag.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-05:

#21

I think the issue is that removing the tag could be seen as a bug, not a feature, since by removing it, you would be potentially stripping off a priority tag that could be used further up the stack. (For example, if the machine was deployed with a virtual bridge that was capable of manipulating said tags.) That's why most software will now handle the tag, though unfortunately the iPXE in Ubuntu still doesn't.

In other words, failure to remove the tag isn't the bug. The bug is that whether or not to remove the tag should be a configuration choice at some level, and right now that choice is too opaque (or simply doesn't exist). Because as an end node, you shouldn't /need/ to care about the tags. But if you end up deploying a server for use in a software-defined network, you /would/ care about the tags, because you might even have permission to manipulate them.

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-05:

#22

Seems other users in other distros experiencing the same, and a kernel update fixes the issues? https://serverfault.com/questions/497391/

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-12-06: Missing required logs.

#23

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1805920

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-06:

#24

Setting to 'Confirmed' in the kernel, although it's not clear what the actual fix would entail. It would certainly be nice to be able to tell a Linux virtual bridge to transparently strip off priority tags before L2 forwarding occurs. That would prevent the issue with iPXE.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
Changed in ipxe (Ubuntu):
status:	New → Confirmed

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-06:

#25

FWIW, I patched ipxe with the CentOS patch as a test, which Vern was going to test:
https://launchpadlibrarian.net/400355423/ipxe_1.0.0+git-20180124.fbe8c52d-0ubuntu2_1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1.diff.gz

Revision history for this message

Vern Hart (vern) wrote on 2018-12-06:

#26

Success.
I installed ipxe-qemu from andreserl's ppa and was able to PXE boot a Pod VM from the infra node that wasn't running dhcpd.

# add-apt-repository ppa:andreserl/maas
# apt update
# apt install ipxe-qemu
# virsh list --all
# virsh start elastic-3

I watched the console of the VM and it succeeded to get DHCP and PXE boot.

Subsequently, I used MAAS to deploy to VMs on all three infra nodes, just to be sure. All succeeded.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-06:

#27

I'm glad that the ipxe in the PPA seems to make it work.

I now read the discussions and all questions that came up for me while doing so were asked and clarified already in later comments.

Therefore I just reviewed the proposed change and it looks good to me (other than the version string but that was just for the PPA, so that is ok).

Only one question to be sure:
I was only wondering if this might trigger any issues in iscsi booting since the change in src/net/netdevice.c adds the stripping to the generic net_poll. Now the (old) commit [1] reads as that would be required to be set. I wonder if there would be any regression in that regard.
I remember words iSCSI+Mass being used together, but I'm unsure if the stack these days still uses it. When Vern confirmed that he could deploy with the modified ipxe, did that include a iSCSI boot?

If not could one of you just double-check that iSCSI boot didn't regress due to this change?

[1]: https://git.ipxe.org/ipxe.git/commit/7d64abbc5d0b5dfe4810883f372b905a359f2697

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-06:

#28

Unfortunately, a modern MAAS no longer uses iSCSI; it will fetch all necessary data over HTTP.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-06: Re: [Bug 1805920] Re: iPXE ignores vlan 0 traffic

#29

> Unfortunately, a modern MAAS no longer uses iSCSI; it will fetch all
> necessary data over HTTP.

Yeah that matches what I heard, used in the past but no more.
But you certainly still have the best knowledge how to set it up from
the past when it did.
Checking that pre/post an upgrade to that PPA would be a great
verification to avoid regressions.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-06:

#30

Yes, MAAS 2.3 (the last revision of MAAS supported on Xenial) can support iSCSI when it is placed in backward compatibility mode; how to do this is documented in the changelog as follows:

maas $PROFILE maas set-config name=http_boot value=False

MAAS 2.2 and earlier (no longer supported) used iSCSI by default.

So you could test this on any version of MAAS on Xenial.

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-07:

#31

Just to clarify the above statements as it has been source of confusion.

MAAS 2.3+ (which is the latest available in Xenial), no longer uses nor supports iSCSI. While the option to fallback to old behavior does exist, it is not enabled by default, its obscured and, given that is not supported, it is to be used at users risk.

That said, I'm not sure whether this change should be backported all the way to Xenial. It would seem to be that it should be backported to Bionic only.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2018-12-07:

#32

Sorry for the confusion.

To be clear, the idea of using MAAS on Xenial was in order to test if the newly-modified iPXE (on Bionic) can support iSCSI boot.

But come to think of it, I don't think that's a good test. If I remember correctly, MAAS used TFTP to transfer the kernel and initrd, /then/ iSCSI was used in order to mount the rootfs. So iPXE's iSCSI functionality wouldn't be exercised in this case.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-10:

#33

On Fri, Dec 7, 2018 at 8:21 PM Mike Pontillo
<email address hidden> wrote:
...

> So iPXE's iSCSI functionality wouldn't be exercised in this case.

So MAAS itself is no good test for that, thanks Mike for the clarification!

So the question is does anyone have a iscsi IPXE case for the last bit
of confidence for Andreas upload to take place.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-10:

#34

Download full text (4.5 KiB)

[1] seems reasonable, I'll give it a try with and without the PPA of Andres.
It needs a slight modification, to not conflict with the default portal.

Install libvirt with all else it usually brings (for the bridge and dhcp on the bridge):
$ sudo install libvirt-daemon-system

So use these commands:
$ curl -O http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
$ qemu-img convert -O raw cirros-0.3.4-x86_64-disk.img cirros.raw
$ sudo targetcli /backstores/fileio/ create cirros $PWD/cirros.raw 100M false
$ sudo targetcli /iscsi create iqn.2016-01.com.example:cirros
$ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/luns create /backstores/fileio/cirros
$ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals delete 0.0.0.0 ip_port=3260
$ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals create 192.168.122.1

If you do that you'll end up with a targetcli config like this:
$ sudo targetcli
targetcli shell version 2.1.fb43
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 0]
  | o- fileio ................................................................................................. [Storage Objects: 1]
  | | o- cirros ........................................................... [/home/ubuntu/cirros.raw (39.2MiB) write-thru activated]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2016-01.com.example:cirros .................................................................................... [TPGs: 1]
  | o- tpg1 .................................................................................................. [gen-acls, no-auth]
  | o- acls .......................................................................................................... [ACLs: 0]
  | o- luns .......................................................................................................... [LUNs: 1]
  | | o- lun0 ........................................................................ [fileio/cirros (/home/ubuntu/cirros.raw)]
  | o- portals .................................................................................................... [Portals: 1]
  | o- 192.168.122.1:3260 ............................................................................................... [OK]
  o- loopback ......................................................................................................... [Tar...

[1] seems reasonable, I'll give it a try with and without the PPA of Andres.
It needs a slight modification, to not conflict with the default portal.

Install libvirt with all else it usually brings (for the bridge and dhcp on the bridge):
$ sudo install libvirt-daemon-system

So use these commands:
 $ curl -O  http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
 $ qemu-img convert -O raw cirros-0.3.4-x86_64-disk.img cirros.raw
 $ sudo targetcli /backstores/fileio/ create cirros $PWD/cirros.raw 100M false
 $ sudo targetcli /iscsi create iqn.2016-01.com.example:cirros
 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/luns create /backstores/fileio/cirros
 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals delete 0.0.0.0 ip_port=3260
 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals create 192.168.122.1

If you do that you'll end up with a targetcli config like this:
$ sudo targetcli
targetcli shell version 2.1.fb43
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 0]
  | o- fileio ................................................................................................. [Storage Objects: 1]
  | | o- cirros ........................................................... [/home/ubuntu/cirros.raw (39.2MiB) write-thru activated]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2016-01.com.example:cirros .................................................................................... [TPGs: 1]
  |   o- tpg1 .................................................................................................. [gen-acls, no-auth]
  |     o- acls .......................................................................................................... [ACLs: 0]
  |     o- luns .......................................................................................................... [LUNs: 1]
  |     | o- lun0 ........................................................................ [fileio/cirros (/home/ubuntu/cirros.raw)]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 192.168.122.1:3260 ............................................................................................... [OK]
  o- loopback ......................................................................................................... [Targets: 0]
  o- srpt ............................................................................................................. [Targets: 0]
  o- vhost ............................................................................................................ [Targets: 0]

Do that and then on qemu start attach to the console early.
To get that easier, instead of VNC use a local curses console with:
  $ sudo qemu-system-x86_64 -smp cpus=2 -curses -boot order=n -netdev bridge,br=virbr0,id=virtio0 -device virtio-net-pci,netdev=virtio

Hit CTRL+B early on boot for ipxe commands

With out virtbr0 default setup having the host on 192.168.122.1 that would be

iPXE> ifopen net0
iPXE> dhcp
iPXE> sanboot iscsi:192.168.122.1::::iqn.2016-01.com.example:cirros

Retested with 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1 from the PPA.
Boot just as much - so while no perfect test (what if that would be in a VLAN tagged network?) it is better than nothing.

That said - together with all that was discussed before - I think Andres could go on uploading it to Disco.
For the SRUs we will need some extra for [2], but one thing at a time.

Or is the assumption that I drive it from here and you only do verifications on the case?

[1]: https://medium.com/oracledevs/kvm-iscsi-part-i-iscsi-boot-with-ipxe-f533f2666075
[2]: https://packages.ubuntu.com/bionic/ipxe-qemu-256k-compat-efi-roms

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-12-11:

#35

This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-0ubuntu5

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu5) disco; urgency=medium

* d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0
priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920)

-- Andres Rodriguez <email address hidden> Mon, 10 Dec 2018 16:26:42 -0500

Changed in ipxe (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-11:

#36

To keep potential regressions even lower I'd for now only consider that for >=Bionic.
That also helps as if someone intentionally spawns an old type KVM machine (pre Bionic) on a >=Bionic host we don#t have to care about this too much (machine type, not release runnin IN the guest). That makes us able to ignore ipxe-qemu-256k-compat-efi-roms in regard to this issue.

Changed in linux (Ubuntu):
status:	Confirmed → Invalid
no longer affects:	linux (Ubuntu Trusty)
no longer affects:	linux (Ubuntu Xenial)
no longer affects:	linux (Ubuntu Bionic)
no longer affects:	linux (Ubuntu Cosmic)
no longer affects:	linux (Ubuntu Disco)
Changed in ipxe-qemu-256k-compat (Ubuntu Trusty):
status:	New → Invalid
Changed in ipxe-qemu-256k-compat (Ubuntu Xenial):
status:	New → Invalid
no longer affects:	ipxe-qemu-256k-compat (Ubuntu Trusty)
no longer affects:	ipxe-qemu-256k-compat (Ubuntu Xenial)
no longer affects:	ipxe-qemu-256k-compat (Ubuntu Bionic)
no longer affects:	ipxe-qemu-256k-compat (Ubuntu Cosmic)
no longer affects:	ipxe-qemu-256k-compat (Ubuntu Disco)
Changed in ipxe-qemu-256k-compat (Ubuntu):
status:	New → Won't Fix
Changed in ipxe (Ubuntu Trusty):
status:	New → Won't Fix
Changed in ipxe (Ubuntu Xenial):
status:	New → Won't Fix
Changed in ipxe-qemu-256k-compat (Ubuntu):
status:	Won't Fix → Invalid
Changed in ipxe (Ubuntu Bionic):
status:	New → Triaged
Changed in ipxe (Ubuntu Cosmic):
status:	New → Triaged

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-11:

#37

Prepped for Bionic and Cosmic in a PPA [1] for Bileto ticket [2]
Depending autopkgtests queued.

I'll run usual virtualization regression checks on that over night and into tomorrow.

MPs are up for review at [3][4], but since Andeas change applies on all these as-is there isn't much difference.

The biggest blocker here si the lack of a more clearly outlined testcase.
@Andres/@Vern - can you help to fill the testcase steps in the SRU template?

[1]: https://bileto.ubuntu.com/#/ticket/3560
[2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3560/+packages
[3]: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360678
[4]: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360679

Christian Ehrhardt  (paelzer) on 2018-12-11

description:	updated
description:	updated

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-12:

#38

FYI: virt stack regression tests started but will still take a while.

To make it very very clear, this is incomplete until some path to test was provided.
Marking the bug that way, waiting on that.

Worst case (and only then) describe the test setup that you have on the customer site and volunteer to be willing and able to verify both Bionic and Cosmic on that setup.
While writing remember the intention is to make the SRU team feel confident about the change and the checks.

Changed in ipxe (Ubuntu Bionic):
status:	Triaged → Incomplete
Changed in ipxe (Ubuntu Cosmic):
status:	Triaged → Incomplete

Revision history for this message

Vern Hart (vern) wrote on 2018-12-12:

#39

The customer has bionic installed on 3 of the blades and I have installed MAAS 2.4.3 on them using the Foundation Cloud Engine. I don't have access to do the OS install myself. I could request a pair of blades installed with cosmic but I'm unsure if I need all 3 or if I can get by with just 2. Easiest would probably be all 3 so that I can be sure at least one is not running dhcpd.

The test would consist of:
# Install MAAS on 3 nodes
# Install ipxe to be tested on all three nodes
# Configure subnet for PXE booting to have primary dhcp on first node and secondary on second node
# Provision Pods on each maas node
# Create and commission a VM on each pod
# The VMs on first and second will commission successfully
# The VM on the third node will fail DHCP/PXE
# You can use virt-viewer to view the console of the VM to verify PXE failure

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-13:

#40

virt stack tests are successful on 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 and 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 from the PPA.
Including various others tests, but mostly related migrations and upgrades between those versions.
Everything fine on that end ...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-13:

#41

Thank Vern for outlining your testcase.

@Vern
Doesn't that also have to include some component that does QoS management adding the VLAN-0 tag?

I don't have the systems to verify this case on Bionic and Cosmic.
As I read it from Vern he can only test Bionic at the Customer and is unsure he can test Cosmic the same way.

@MAAS team - Would you have a MAAS tets environment that you can re-use to verify this?
If so could you confirm that you can locally trigger the bug as it is today so that we can rely on it on SRU processing?

@Vern - if above is Nack'ed by the MAAS team could you ask if you could if could get nodes to verify that on Cosmic as well. As I read your test description I think it would be enough to bump the node that is supposed to start the guest to Cosmic+Proposed then - no need to update all systems to Cosmic.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-13:

#42

TL;DR:
- I really really tried, but failed to recreate yout case on a single system
- I need your real setup as the VLAN-Tag-0 addition in your case seems to be different
- That makes my request to one of you committing (and checking to be able to) to verify this even more important - see comment #41

Details (of a failed test approach)

# Simple iPXE (without dhcp/tftp/...)

# get some virtualization that gives us a bridge with dhcp
$ sudo apt install uvtool-libvirt apache2
# re-logon for permissions

# copy host kernel there to boot from
$ sudo cp -v /boot/vmlinuz-$(uname -r) /boot/initrd.img-$(uname -r) /var/www/html/
$ sudo chown www-data:www-data /var/www/html/*

# prep qemu to tap on the libvirt bridge
sudo mkdir -p /etc/qemu
$ echo "allow all" | sudo tee /etc/qemu/bridge.conf

# start qemu and right at the start press Ctrl+B to get to the iPXE prompt
$ sudo qemu-system-x86 -cpu host -net nic -net bridge,br=virbr0 -m 1024 -enable-kvm -curses -boot n

# in IPXE then
iPXE> dhcp
# check your dhcp config to work on the expected network
iPXE> show ip
# use your IPs and kernel versions for this
iPXE> kernel http://192.168.122.1/vmlinuz-4.15.0-42-generic
iPXE> initrd http://192.168.122.1/initrd.img-4.15.0-42-generic
iPXE> boot

You can do the same with a config, by putting a ipxe config file at your apache
cat << EOF >/var/www/html/ipxe.config
#!ipxe
kernel http://192.168.122.1/vmlinuz-4.15.0-42-generic
initrd http://192.168.122.1/initrd.img-4.15.0-42-generic
boot
EOF

And then boot with chainbooting:
iPXE> dhcp
iPXE> chain http://192.168.122.1/ipxe.config

# Try to use VLANs here
# Note: There would actually be afull VLAN feature which no one every requested.
# It is off atm, but per https://ipxe.org/cmd/vcreate we could now do like
iPXE> vcreate --tag 42 net0
iPXE> set net0-42/ip 192.168.123.100
iPXE> set net0-42/netmask 255.255.255.0
iPXE> set net0-42/gateway 192.168.123.1

So I wonder about your case, we have a non vlan-aware iPXE that gets 0-tagged packages - is that correct? And most other network stacks would shrug the 0-tag off, but iPXE does not and thinks it is not there (unless you'd config through vcreate maybe).
Lets try to "simulate" that ...

# Add a "normal" VLAN tag 0 interface to the host bridge
$ sudo ip link add name virbr0.0 link virbr0 type vlan id 0
$ sudo ip addr add 192.168.124.1/24 broadcast 192.168.124.255 dev virbr0.0

# On boot we configure iPXE to use that IP range, but intentionally ignoring any VLAN tagging
iPXE> set net0/ip 192.168.124.100
iPXE> set net0/netmask 255.255.255.0
iPXE> set net0/gateway 192.168.124.1
iPXE> set net0/dns 192.168.124.1
iPXE> ifopen net0
iPXE> chain http://192.168.124.1/ipxe.config

Without the fix this blocks on not reaching it
iPXE> chain http://192.168.124.1/ipxe.config
http://192.168.124.1/ipxe.config.................. Connection timed out (http://ipxe.org/4c0a6035)

Installed the 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 from proposed but it fails there as well.
Probably your VLAN-0-TAG case is slightly different to what I had assumed here, but atm I have no way to know where.

Details (of a failed test approach)

# Simple iPXE (without dhcp/tftp/...)

# get some virtualization that gives us a bridge with dhcp
$ sudo apt install uvtool-libvirt apache2
# re-logon for permissions

# copy host kernel there to boot from
$ sudo cp -v /boot/vmlinuz-$(uname -r) /boot/initrd.img-$(uname -r) /var/www/html/
$ sudo chown www-data:www-data /var/www/html/*

# prep qemu to tap on the libvirt bridge
sudo mkdir -p /etc/qemu
$ echo "allow all" | sudo tee /etc/qemu/bridge.conf

# start qemu and right at the start press Ctrl+B to get to the iPXE prompt
$ sudo qemu-system-x86 -cpu host -net nic -net bridge,br=virbr0 -m 1024 -enable-kvm -curses -boot n

And then boot with chainbooting:
iPXE> dhcp
iPXE> chain http://192.168.122.1/ipxe.config

# Add a "normal" VLAN tag 0 interface to the host bridge
$ sudo ip link add name virbr0.0 link virbr0 type vlan id 0
$ sudo ip addr add 192.168.124.1/24 broadcast 192.168.124.255 dev virbr0.0

Without the fix this blocks on not reaching it
iPXE> chain http://192.168.124.1/ipxe.config
http://192.168.124.1/ipxe.config.................. Connection timed out (http://ipxe.org/4c0a6035)

Revision history for this message

Vern Hart (vern) wrote on 2018-12-13:

#43

It seems an important component to the failure scenario is the hardware. The customer equipment is a Cisco UCS chassis and the MAAS nodes are blades in that chassis. Even though we cannot find anything in configuration that specifically adds the vlan-0 tag (or priority tag), traffic between the blades goes out one node untagged and shows up tagged on the other node.

Some bugs/discussions around vlan-0 and UCS:

  https://quickview.cloudapps.cisco.com/quickview/bug/CSCuu29425
  https://quickview.cloudapps.cisco.com/quickview/bug/CSCuz83183
  https://bugs.launchpad.net/opencontrail/+bug/1457805
  https://arstechnica.com/civis/viewtopic.php?f=10&t=1442797
  https://lists.linuxfoundation.org/pipermail/fds-dev/2017-May/000710.html
  http://lists.openstack.org/pipermail/openstack-operators/2013-April/002777.html
  https://linux.oracle.com/pls/apex/f?p=102:2:::NO::P2_VC_ID,P2_VERSION:606,1.0

As a note, Cisco seems to suggest it's a bug in Linux, citing these two old posts:

https://lists.openwall.net/netdev/2013/09/10/30
https://lists.linuxfoundation.org/pipermail/bridge/2015-July/009630.html

But I'm not convinced they are valid since this vlan-0 tag problem only shows up with this specific Cisco hardware. It seems like there are multiple network related software projects (like ipxe, vpp, probably others) that are forced to deal with the special case of vlan 0 (priority tagging) being added by Cisco UCS switches because Cisco's stance is that they're not adding the tags.

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2018-12-13:

#44

FWIW, Cisco documentation here states that priority tagging is enabled by default:

https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html

"Default Settings
VLAN 0 priority tagging is enabled by default."

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-14:

#45

Thanks for the Details, but that means that even the MAAS Team might
have enough systems, but not the right special HW.
Therefore @Vern could you do the pre-checks with your associated
customer that you can verify the bug on their setup before we push it
as SRU?
As I said, you only need to have Bionic/Cosmic on the target KVM Host
that is supposed to spawn the Pod and currently fails.
IMHO all other components can stay as-is.

Please let me know if that would be possible

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-18:

#46

I'll upload once I get a confirmation that it will be tested on both releases on the affected HW.

description:

updated

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-01-08:

#47

NO response yet, we really want to fix it but need some way to verify it.
Sorry to ask once again, but we need to get this unblocked.
@Vern - can you use the setup to verify the two planned uploads for Bionic and Cosmic?

Revision history for this message

Vern Hart (vern) wrote on 2019-01-09:

#48

I was able to verify the fix works for bionic using maas 2.4.3. Am about to install cosmic on the customer hardware to verify the fix there too.

I have run into a maybe-related issue with maas 2.5.0 filed separately here: https://bugs.launchpad.net/maas/+bug/1811021

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-01-09:

#49

Ok, once you know you'll be able to verify it on Cosmic as well (by successfully testing from the PPA) let me know.
We will then upload it as a real SRU which needs verification per [1].

[1]: https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message

Vern Hart (vern) wrote on 2019-01-11:

#50

screenshot of failed boot Edit (422.2 KiB, image/png)

I have run a test on cosmic. The test involved MAAS 2.4.3 installed on bionic on 3 of the blades of the UCS chassis in the customer's data center. I installed cosmic, 18.10 on a 4th blade and installed libvirt and qemu-kvm and defined a VM similar to how maas defines VMs. with this xml: https://pastebin.ubuntu.com/p/yCTRGDjx2H/

$ lsb_release -d
Description: Ubuntu 18.10

The ipxe-qemu version installed from dist is:

1.0.0+git-20180124.fbe8c52d-0ubuntu4

The attached screenshot is of the failed pxe boot of the testipxe vm.

Added the ppa:andreserl/maas apt repo and installed ipxe-qemu which gave me version:

1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1

Note that I had to edit /etc/apt/sources.list.d/andreserl-ubuntu-maas-cosmic.list replacing "cosmic" with "bionic" because that repo doesn't have cosmic packages. And then I had to downgrade the ipxe-qemu because the cosmic version is greater than the one in the fix repo:

# apt install ipxe-qemu=1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1

Once I jumped through those hoops, I booted the exact same testipxe vm that failed to pxe boot above and it succeeded in getting an IP and commission in MAAS.

Revision history for this message

Vern Hart (vern) wrote on 2019-01-11:

#51

After realizing there are packages in the ci build [1] I installed the following version from there:

1.0.0+git-20180124.fbe8c52d-0ubuntu4.1

I redefined the testipxe vm from the above test, and it also successfully pxe booted.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3560/+packages

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-01-14:

#52

Ok, thanks for the precheck Vern.
Now that we know that you will be able to SRU-verify this I have uploaded it to the SRU queue.

Once accepted by the SRU Team there will be updates here asking for verification. Please do that for Bionic and Cosmic then - if you need any help let us know.
Eventually that will make the fix released into the Ubuntu Archive.

Revision history for this message

Brian Murray (brian-murray) wrote on 2019-01-15: Please test proposed package

#53

Hello Vern, or anyone else affected,

Accepted ipxe into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ipxe (Ubuntu Cosmic):
status:	Incomplete → Fix Committed
tags:	added: verification-needed verification-needed-cosmic

Revision history for this message

Brian Murray (brian-murray) wrote on 2019-01-15:

#54

Hello Vern, or anyone else affected,

Accepted ipxe into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 in a few hours, and then in the -proposed repository.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ipxe (Ubuntu Bionic):
status:	Incomplete → Fix Committed
tags:	added: verification-needed-bionic

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-01-22:

#55

@Vern - if there is any ETA when you will get to the real SRU verifications that were requested by Brian a week ago let us know

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-02-12:

#56

@Vern - ping please test to unblock this from bionic-/cosmic-proposed

Revision history for this message

Vern Hart (vern) wrote on 2019-02-12:

#57

Yes, sorry. I will try to test tonight (in a few hours) when I'm back at the hotel. Not that I only have bionic and xenial to test with. I can try to upgrade one of those to cosmic.

Revision history for this message

Vern Hart (vern) wrote on 2019-02-12:

#58

s/Not/Note/

Revision history for this message

Joshua Powers (powersj) wrote on 2019-02-21:

#59

Any update on testing?

Revision history for this message

Vern Hart (vern) wrote on 2019-02-27:

#60

I have two nodes, bionic and cosmic.
I enabled the proposed repo on each.
I installed ipxe:

apt install ipxe ipxe-qemu grub-ipxe

On bionic, this gave me:

  # apt list --installed ipxe-qemu grub-ipxe ipxe
  Listing... Done
  grub-ipxe/bionic-proposed,bionic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 all [installed,automatic]
  ipxe/bionic-proposed,bionic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 all [installed]
  ipxe-qemu/bionic-proposed,bionic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 all [installed]

On cosmic, this gave me:

  # apt list --installed ipxe ipxe-qemu grub-ipxe
  Listing... Done
  grub-ipxe/cosmic-proposed,cosmic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 all [installed,automatic]
  ipxe-qemu/cosmic-proposed,cosmic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 all [installed]
  ipxe/cosmic-proposed,cosmic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 all [installed]

I launched a VM configured to netboot on each system.

The VM PXE booted properly on both bionic and cosmic.

Revision history for this message

Vern Hart (vern) wrote on 2019-02-27:

#61

As negative confirmation: I tested a PXE boot with 1.0.0+git-20180124.fbe8c52d-0ubuntu2.1 on bionic and 1.0.0+git-20180124.fbe8c52d-0ubuntu4 on cosmic.
As expected, the VMs failed to successfully PXE boot.

tags:	added: verification-done-bionic verification-done-cosmic removed: verification-needed-bionic verification-needed-cosmic
tags:	added: verification-done removed: verification-needed

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2019-03-04: Update Released

#62

The verification of the Stable Release Update for ipxe has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-03-04:

#63

This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu4.1) cosmic; urgency=medium

* d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0
priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920)

-- Andres Rodriguez <email address hidden> Mon, 10 Dec 2018 16:26:42 -0500

Changed in ipxe (Ubuntu Cosmic):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-03-04:

#64

This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu2.2) bionic; urgency=medium

* d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0
priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920)

-- Andres Rodriguez <email address hidden> Mon, 10 Dec 2018 16:26:42 -0500

Changed in ipxe (Ubuntu Bionic):
status:	Fix Committed → Fix Released

Brad Figg (brad-figg) on 2019-07-24

tags:

added: cscc

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.