iPXE ignores vlan 0 traffic

Bug #1805920 reported by Vern Hart on 2018-11-30
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
ipxe (Ubuntu)
Status tracked in Disco
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Undecided
Unassigned
Disco
Undecided
Unassigned
ipxe-qemu-256k-compat (Ubuntu)
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned

Bug Description

[Impact]

 * VLAN 0 is special (for QoS actually, not a real VLAN)
 * Some components in the stack accidentally strip it, so does ipxe in
   this case.
 * Fix by porting a fix that is carried by other distributions as upstream
   didn't follow the suggestion but it is needed for the use case affected
   by the bug here (Thanks Andres)

[Test Case]

 * Comment #42 contains a virtual test setup to understand the case but it
   does NOT trigger the isse. That requires special switch HW that adds
   VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a
   customer site with such hardware being affected by this issue.

[Regression Potential]

 * The only reference to VLAN tags on iPXE boot that we found was on iBFT
   boot for SCSI, we tested that in comment #34 and it still worked fine.
 * We didn't see such cases on review, but there might be use cases that
   made some unexpected use of the headers which are now stripped. But
   that seems wrong.

[Other Info]

 * n/a

---

I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot.

While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface.

Here is the dump when the PXE boot fails (no dhcp server on infra2):
https://pastebin.canonical.com/p/THW2gTSv4S/

Here is the dump when PXE boot succeeds (when infra2 is serving dhcp):
https://pastebin.canonical.com/p/HH3XvZtTGG/

The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging.

Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged.

The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes.

I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread:

http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html
http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html

Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly.

Related branches

Vern Hart (vhart) wrote :
Vern Hart (vhart) wrote :

Subscribed ~field-critical as this will block managed service handover.

Andres Rodriguez (andreserl) wrote :

Hi Vern,

Based on your description it would sound like this is *not* a MAAS issue, but rather, it could either be a networking issue, a misconfiguration issue or a ipxe issue.

To start, from the ipxe side:

1. MAAS doesn't support PXE booting over VLANs.
2. MAAS doesn't support ipxe. We rely on the VMs on booting pxelinux emulation but MAAS doesn't pass a ipxe rom, nor configures ipxe at all.

Second, how is your network configuration? While I understand you have 3 machines running MAAS, and only VMs that have DHCP running locally can PXE boot, I'm wondering:

1. Can you confirm that all bridges of the pods on the underlying network are using a /physical/ bridge in the /same/ vlan ?
2. What's the machines interface configuration? How about the bridge configuration?
3. To what interfaces are the MAAS DHCP servers responding to? (e.g. ps faux | grep dhcpd)
4. Can you confirm that VMs inside pod on infra1 can communicate with VMs inside pod of infra2 or 3?

Lastly, I have a feeling that this is likely related to the network configuration itself, or the configuration of the bridges:
 1. For networking, is STP/Portfast enabled? https://docs.maas.io/2.5/en/installconfig-network-stp
 2. Does the bridge where the VMs use, configured with STP and/or a long forward-delay? https://wiki.libvirt.org/page/PXE_boot_(or_dhcp)_on_guest_failed

Changed in maas:
status: New → Incomplete
Vern Hart (vhart) wrote :
Download full text (6.9 KiB)

I agree this may not be a MAAS bug specifically but I'm not sure where else to seek assistance.

You say MAAS doesn't support PXE booting over VLANs but vlan 0 is special: https://en.wikipedia.org/wiki/IEEE_802.1Q#Frame_format
"The reserved value 0x000 indicates that the frame does not carry a VLAN ID"

Here is the relevant portion of the switch config that the customer has shared with me. The ports are configured to vlan 17 as native (untagged) and to only allow vlan 17 at all. Note that this is not vlan 0.

  interface Vethernet2424
    description server 1/3, VNIC eth0
    switchport mode trunk
    no lldp transmit
    no lldp receive
    no pinning server sticky
    pinning server pinning-failure link-down
    switchport trunk native vlan 17
    switchport trunk allowed vlan 17
    bind interface port-channel1287 channel 2424
    no shutdown

  interface Vethernet2426
    description server 1/2, VNIC eth0
    switchport mode trunk
    no lldp transmit
    no lldp receive
    no pinning server sticky
    pinning server pinning-failure link-down
    switchport trunk native vlan 17
    switchport trunk allowed vlan 17
    bind interface port-channel1286 channel 2426
    no shutdown

  interface Vethernet2428
    description server 1/1, VNIC eth0
    switchport mode trunk
    no lldp transmit
    no lldp receive
    no pinning server sticky
    pinning server pinning-failure link-down
    switchport trunk native vlan 17
    switchport trunk allowed vlan 17
    bind interface port-channel1285 channel 2428
    no shutdown

1. On all 3 MAAS nodes, the physical interface enp6s0 is the sole member of bondm which is in bride broam. The vnet interfaces of VMs show up under broam as well. The physical interfaces are not vlan tagged.
2. The netplan on each machine looks like this (with differing addresses and customer specific nameserver info):

  network:
      ethernets:
          enp6s0:
              dhcp4: false
      version: 2
      bonds:
          bondm:
              interfaces: [ enp6s0 ]
              parameters:
                  mode: active-backup
                  primary: enp6s0
      bridges:
          broam:
              addresses: [ 10.17.101.10/22 ]
              gateway4: 10.17.100.1
              interfaces: [ bondm ]
              nameservers:
                  addresses: [ 123.123.123.1, 123.123.123.2 ]
                  search: [ unicloud1.example.net ]

3. The command-line for dhcpd doesn't show an interface:

  vernhart@infra1:~$ ps fuax | grep dhcpd
  vernhart 24086 0.0 0.0 13136 1100 pts/8 S+ 20:25 0:00 \_ grep --color=auto dhcpd
  dhcpd 8794 0.0 0.0 45964 16976 ? Ss Nov29 0:11 dhcpd -user dhcpd -group dhcpd -f -q -4 -pf /run/maas/dhcp/dhcpd.pid -cf /var/lib/maas/dhcpd.conf -lf /var/lib/maas/dhcp/dhcpd.leases broam
  vernhart@infra1:~$ sudo netstat -nlp | grep dhcp
  tcp 0 0 10.17.101.10:647 0.0.0.0:* LISTEN 8794/dhcpd
  tcp 0 0 0.0.0.0:7911 0.0.0.0:* LISTEN 8794/dhcpd
  udp 5120 0 0.0.0.0:67 0.0.0.0:* 8794/dhcpd
  udp ...

Read more...

Mike Pontillo (mpontillo) wrote :

Given that VID 0 is, as you say, a special value that is treated as DSCP-only untagged packet, VID 0 is only meaningful to network switching infrastructure, not end nodes. From what I understand, networking hardware should be configured to strip a 802.1Q tag with VID 0 before sending the packet to the end node.

In other words, this sounds like a bug or misconfiguration on the network side.

Mike Pontillo (mpontillo) wrote :

I'm not positive this will work, but maybe if you do something like this on the trunk port connected to the server, it will do what you want?

    switchport trunk native vlan <vid>

This causes the switch to assume untagged traffic is on <vid>. It might also cause the tag for that VID to be stripped before it egresses the port, but there is some debate on that point, and I'm not enough of a Cisco expert to tell you the behavior for sure. ;-)

Mike Pontillo (mpontillo) wrote :

Er, I see you've already done that, so I suppose that isn't the only variable here. I'm not sure how to tell the switch to treat that VLAN as the untagged VLAN, /and/ strip the tag when it egresses that port. That's what you need to do to get this to work, I think.

Jason Hobbs (jason-hobbs) wrote :

Why are the switchports in trunk mode, rather than access mode, when you only want to carry traffic for vlan 17 on them?

If that's the case, I think you should configure the ports as being in access mode, and setting the access vlan to 17.

Mike Pontillo (mpontillo) wrote :

I agree with Jason. I've been trying to find out if iPXE upstream has incorporated the bug fix you mentioned, or similar; so far I've found this, which is similar but not quite the same:

https://github.com/ipxe/ipxe/commit/db3443608fe32fffb4f6ad467bfc035a824bff52

The interesting part here is that perhaps the NIC driver should have stripped the tag, but didn't.

Mike Pontillo (mpontillo) wrote :

Another thought: I wonder if you could use `ethtool` or similar to tell the hardware to strip off the unwanted VLAN tags.

Secondly, I found a document[1] that states that if you load the `8021q` driver into the kernel, the hardware will be automatically configured to offload tag stripping. Not sure if that would help in this case.

[1]:
https://www.intel.com/content/www/us/en/support/articles/000005498/network-and-i-o/ethernet-products.html

Andres Rodriguez (andreserl) wrote :

Just as a note see [1]. Basically a user just came across this too.

It would seem like the network admins should provide some light on this.

https://community.cisco.com/t5/unified-computing-system/cisco-ucs-nexus-802-1p-tagging-vlan0-on-traffic-between-blades/m-p/3755502

Andres Rodriguez (andreserl) wrote :

I did some digging, and it seems that:

1. Upstream has not accepted a patch from redhat to strip the priority tags [1]
2. RHEL (and consequently centos) are patching ipxe directly [2], [3].

Based on this, I think this patch could potentially be ported into ipxe in Ubuntu, however, I think there should be some investigation into the network as to why packets are not stripped on egress.

[1]: http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html
[2]: https://git.centos.org/blob/rpms!ipxe.git/c7/SPECS!ipxe.spec
[3]: https://git.centos.org/blob/rpms!ipxe.git/c7/SOURCES!0009-Strip-802.1Q-VLAN-0-priority-tags.patch

Mike Pontillo (mpontillo) wrote :

Marking Invalid for MAAS since this is unrelated to MAAS itself.

Changed in maas:
status: Incomplete → Opinion
status: Opinion → Invalid
Vern Hart (vhart) wrote :

Some interesting observations. The customer deployed a pair of Centos7 machines and confirmed the vlan0 tag issue existed there as well. That wasn't too surprising.

However, they deployed a pair of Centos6 machines and they do NOT have the vlan0 tag issue.

This seems to confirm that the issue is not actually within the switch but within linux itself.

Within a thread on a community discussion on cisco.com, a Cisco employee responded saying it's a Linux bug that should already be patched. The Cisco person's response:

> You will find this behavior in all linux destro. This issue has been documented under-
> https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuu29425/?reffering_site=dumpcr
>
> You may wanna try "net.bridge.bridge-nf-filter-vlan-tagged = 1" but I haven't tested it.

That URL referenced is behind a login page so I've attached a pdf of the page from the customer.

Within that document it mentions a couple of URLS:

    https://lists.openwall.net/netdev/2013/09/10/30
    https://lists.linuxfoundation.org/pipermail/bridge/2015-July/009630.html

Those are very old. If they are describing the same problem/solution, then this would be a regression.

The net.bridge.bridge-nf-filter-vlan-tagged setting is for filtering vlans with iptables. I feel that is not likely the right direction.

Mike Pontillo (mpontillo) wrote :

Interesting developments. I agree that it doesn't seem like `bridge-nf-filter-vlan-tagged` is what we want, unless there is a special case not filter packets tagged on VID 0. It might be worth trying this out on the bridge used to boot the pod VMs.

The frustrating thing about this bug report is the large amount of layers we would need to check for correct behavior. We are trying to verify all of the following points of contact with a "VLAN 0" tagged packet at once:

 - The network infrastructure
   (it would be better for VLAN 0 tags to be stripped before reaching end nodes)
 - NIC hardware (some NICs handle VLAN filtering in hardware; I'm not sure if they could automatically discard the VLAN 0 tag before it's handed up the stack)
 - Linux NIC driver (would be programming the aforementioned hardware to hopefully do the right thing - or not)
 - Linux bridge driver (or whatever is handing the packets from the OS to the pod)
 - Hypervisor NIC driver (that is, the virtual hardware which will be booting on the VM)
 - iPXE (which would be using a minimal network stack prior to booting the virtual OS)
 - Virtual OS Linux driver

I feel like we aren't sure which of these layers might have a problem handling packets tagged on VID 0. We know iPXE has a problem for sure; the other layers would need to be tested in isolation to verify the behavior.

Andres confirmed that RHEL/CentOS are patching iPXE to treat VLAN 0 frames as untagged (his link showed that the patch was on CentOS 7). But there are many layers here that could be handling them improperly. That is, if we know that CentOS 6 works, we can't rule out that CentOS 7 may not be working due to an issue elsewhere in the stack (not in iPXE).

Andres Rodriguez (andreserl) wrote :

Hi Vern,

Could you confirm something to us. WHen you say that the customer deployed centos 7 and were able to see the issue there, did you mean that they saw the issue in the packets themselves? In other words, CentOS7 deployed successfully but the packets still had the VLAN tag?

The reason I ask is because centos7 ipxe includes a patch that strips the vlan tag. So I'm wondering if that allows VMs on a CentOS 7 host to PXE boot without failing?

Anyhow, I've made a version of ipxe for Ubuntu with the centos patch in ppa:andreserl/maas. I've not tested it, but if you could take a look and test if that would fix the issue, then we could push that to Ubuntu.

Anyhow, we need to talk to the kernel folks to figure this one out.

Vern Hart (vhart) wrote :

The Centos7 deployments were to other blades in the UCS chassis and pings between the two Centos7 machines did not have vlan 0 tags.

I'll add your ppa and update to your ipxe and test pod VM booting.

Vern Hart (vhart) wrote :

Sorry, I got that wrong. A deployment of a pair of Centos6 hosts to the blades did not have vlan 0 tags. The Centos7 deployments did have vlan 0 tags.

Mike Pontillo (mpontillo) wrote :

@vhart, can you clarify that last comment? Do you mean that the CentOS 6 VMs exhibit the issue caused by the VLAN 0 tags and the CentOS 7 tags do not? Or are you saying the tags are filtered elsewhere in the stack on CentOS 6 (such as the NIC driver, 8021q driver, or virtual bridge)?

Vern Hart (vhart) wrote :

I have not tried Centos6 VMs.

The customer reported that he deployed Centos7 to the baremetal blades and saw, in ping traffic that incoming packets were vlan 0 tagged.

Then he deployed Centos6 to two baremetal blades and, in a ping test, did not see any vlan tags.

If this is true, it suggests Ubuntu 18.04 and Centos 7 either add the vlan 0 tag or fail to remove it (if it's added by something else). Conversely, Centos 6 either doesn't add the tag or succeeds in removing it.

I did not ask the customer to replicate the scenario of a kvm instance PXE booting on top of the Centos deployment from a DHCP server on a different blade. And I do not have access to test that scenario myself. Another test I'd like to see the results of are ping tests between Ubuntu 18.04 and Centos 6 -- to see which side, if any, sees the vlan 0 tag.

Mike Pontillo (mpontillo) wrote :

I think the issue is that removing the tag could be seen as a bug, not a feature, since by removing it, you would be potentially stripping off a priority tag that could be used further up the stack. (For example, if the machine was deployed with a virtual bridge that was capable of manipulating said tags.) That's why most software will now handle the tag, though unfortunately the iPXE in Ubuntu still doesn't.

In other words, failure to remove the tag isn't the bug. The bug is that whether or not to remove the tag should be a configuration choice at some level, and right now that choice is too opaque (or simply doesn't exist). Because as an end node, you shouldn't /need/ to care about the tags. But if you end up deploying a server for use in a software-defined network, you /would/ care about the tags, because you might even have permission to manipulate them.

Andres Rodriguez (andreserl) wrote :

Seems other users in other distros experiencing the same, and a kernel update fixes the issues? https://serverfault.com/questions/497391/

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1805920

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Mike Pontillo (mpontillo) wrote :

Setting to 'Confirmed' in the kernel, although it's not clear what the actual fix would entail. It would certainly be nice to be able to tell a Linux virtual bridge to transparently strip off priority tags before L2 forwarding occurs. That would prevent the issue with iPXE.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in ipxe (Ubuntu):
status: New → Confirmed
Vern Hart (vhart) wrote :

Success.
I installed ipxe-qemu from andreserl's ppa and was able to PXE boot a Pod VM from the infra node that wasn't running dhcpd.

# add-apt-repository ppa:andreserl/maas
# apt update
# apt install ipxe-qemu
# virsh list --all
# virsh start elastic-3

I watched the console of the VM and it succeeded to get DHCP and PXE boot.

Subsequently, I used MAAS to deploy to VMs on all three infra nodes, just to be sure. All succeeded.

I'm glad that the ipxe in the PPA seems to make it work.

I now read the discussions and all questions that came up for me while doing so were asked and clarified already in later comments.

Therefore I just reviewed the proposed change and it looks good to me (other than the version string but that was just for the PPA, so that is ok).

Only one question to be sure:
I was only wondering if this might trigger any issues in iscsi booting since the change in src/net/netdevice.c adds the stripping to the generic net_poll. Now the (old) commit [1] reads as that would be required to be set. I wonder if there would be any regression in that regard.
I remember words iSCSI+Mass being used together, but I'm unsure if the stack these days still uses it. When Vern confirmed that he could deploy with the modified ipxe, did that include a iSCSI boot?

If not could one of you just double-check that iSCSI boot didn't regress due to this change?

[1]: https://git.ipxe.org/ipxe.git/commit/7d64abbc5d0b5dfe4810883f372b905a359f2697

Mike Pontillo (mpontillo) wrote :

Unfortunately, a modern MAAS no longer uses iSCSI; it will fetch all necessary data over HTTP.

> Unfortunately, a modern MAAS no longer uses iSCSI; it will fetch all
> necessary data over HTTP.

Yeah that matches what I heard, used in the past but no more.
But you certainly still have the best knowledge how to set it up from
the past when it did.
Checking that pre/post an upgrade to that PPA would be a great
verification to avoid regressions.

Mike Pontillo (mpontillo) wrote :

Yes, MAAS 2.3 (the last revision of MAAS supported on Xenial) can support iSCSI when it is placed in backward compatibility mode; how to do this is documented in the changelog as follows:

    maas $PROFILE maas set-config name=http_boot value=False

MAAS 2.2 and earlier (no longer supported) used iSCSI by default.

So you could test this on any version of MAAS on Xenial.

Andres Rodriguez (andreserl) wrote :

Just to clarify the above statements as it has been source of confusion.

MAAS 2.3+ (which is the latest available in Xenial), no longer uses nor supports iSCSI. While the option to fallback to old behavior does exist, it is not enabled by default, its obscured and, given that is not supported, it is to be used at users risk.

That said, I'm not sure whether this change should be backported all the way to Xenial. It would seem to be that it should be backported to Bionic only.

Mike Pontillo (mpontillo) wrote :

Sorry for the confusion.

To be clear, the idea of using MAAS on Xenial was in order to test if the newly-modified iPXE (on Bionic) can support iSCSI boot.

But come to think of it, I don't think that's a good test. If I remember correctly, MAAS used TFTP to transfer the kernel and initrd, /then/ iSCSI was used in order to mount the rootfs. So iPXE's iSCSI functionality wouldn't be exercised in this case.

On Fri, Dec 7, 2018 at 8:21 PM Mike Pontillo
<email address hidden> wrote:
...

> So iPXE's iSCSI functionality wouldn't be exercised in this case.

So MAAS itself is no good test for that, thanks Mike for the clarification!

So the question is does anyone have a iscsi IPXE case for the last bit
of confidence for Andreas upload to take place.

Download full text (4.5 KiB)

[1] seems reasonable, I'll give it a try with and without the PPA of Andres.
It needs a slight modification, to not conflict with the default portal.

Install libvirt with all else it usually brings (for the bridge and dhcp on the bridge):
$ sudo install libvirt-daemon-system

So use these commands:
 $ curl -O http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
 $ qemu-img convert -O raw cirros-0.3.4-x86_64-disk.img cirros.raw
 $ sudo targetcli /backstores/fileio/ create cirros $PWD/cirros.raw 100M false
 $ sudo targetcli /iscsi create iqn.2016-01.com.example:cirros
 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/luns create /backstores/fileio/cirros
 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals delete 0.0.0.0 ip_port=3260
 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals create 192.168.122.1

If you do that you'll end up with a targetcli config like this:
$ sudo targetcli
targetcli shell version 2.1.fb43
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 0]
  | o- fileio ................................................................................................. [Storage Objects: 1]
  | | o- cirros ........................................................... [/home/ubuntu/cirros.raw (39.2MiB) write-thru activated]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2016-01.com.example:cirros .................................................................................... [TPGs: 1]
  | o- tpg1 .................................................................................................. [gen-acls, no-auth]
  | o- acls .......................................................................................................... [ACLs: 0]
  | o- luns .......................................................................................................... [LUNs: 1]
  | | o- lun0 ........................................................................ [fileio/cirros (/home/ubuntu/cirros.raw)]
  | o- portals .................................................................................................... [Portals: 1]
  | o- 192.168.122.1:3260 ............................................................................................... [OK]
  o- loopback ......................................................................................................... [Tar...

Read more...

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-0ubuntu5

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu5) disco; urgency=medium

  * d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0
    priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920)

 -- Andres Rodriguez <email address hidden> Mon, 10 Dec 2018 16:26:42 -0500

Changed in ipxe (Ubuntu):
status: Confirmed → Fix Released

To keep potential regressions even lower I'd for now only consider that for >=Bionic.
That also helps as if someone intentionally spawns an old type KVM machine (pre Bionic) on a >=Bionic host we don#t have to care about this too much (machine type, not release runnin IN the guest). That makes us able to ignore ipxe-qemu-256k-compat-efi-roms in regard to this issue.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
no longer affects: linux (Ubuntu Trusty)
no longer affects: linux (Ubuntu Xenial)
no longer affects: linux (Ubuntu Bionic)
no longer affects: linux (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Disco)
Changed in ipxe-qemu-256k-compat (Ubuntu Trusty):
status: New → Invalid
Changed in ipxe-qemu-256k-compat (Ubuntu Xenial):
status: New → Invalid
no longer affects: ipxe-qemu-256k-compat (Ubuntu Trusty)
no longer affects: ipxe-qemu-256k-compat (Ubuntu Xenial)
no longer affects: ipxe-qemu-256k-compat (Ubuntu Bionic)
no longer affects: ipxe-qemu-256k-compat (Ubuntu Cosmic)
no longer affects: ipxe-qemu-256k-compat (Ubuntu Disco)
Changed in ipxe-qemu-256k-compat (Ubuntu):
status: New → Won't Fix
Changed in ipxe (Ubuntu Trusty):
status: New → Won't Fix
Changed in ipxe (Ubuntu Xenial):
status: New → Won't Fix
Changed in ipxe-qemu-256k-compat (Ubuntu):
status: Won't Fix → Invalid
Changed in ipxe (Ubuntu Bionic):
status: New → Triaged
Changed in ipxe (Ubuntu Cosmic):
status: New → Triaged

Prepped for Bionic and Cosmic in a PPA [1] for Bileto ticket [2]
Depending autopkgtests queued.

I'll run usual virtualization regression checks on that over night and into tomorrow.

MPs are up for review at [3][4], but since Andeas change applies on all these as-is there isn't much difference.

The biggest blocker here si the lack of a more clearly outlined testcase.
@Andres/@Vern - can you help to fill the testcase steps in the SRU template?

[1]: https://bileto.ubuntu.com/#/ticket/3560
[2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3560/+packages
[3]: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360678
[4]: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360679

description: updated
description: updated

FYI: virt stack regression tests started but will still take a while.

To make it very very clear, this is incomplete until some path to test was provided.
Marking the bug that way, waiting on that.

Worst case (and only then) describe the test setup that you have on the customer site and volunteer to be willing and able to verify both Bionic and Cosmic on that setup.
While writing remember the intention is to make the SRU team feel confident about the change and the checks.

Changed in ipxe (Ubuntu Bionic):
status: Triaged → Incomplete
Changed in ipxe (Ubuntu Cosmic):
status: Triaged → Incomplete
Vern Hart (vhart) wrote :

The customer has bionic installed on 3 of the blades and I have installed MAAS 2.4.3 on them using the Foundation Cloud Engine. I don't have access to do the OS install myself. I could request a pair of blades installed with cosmic but I'm unsure if I need all 3 or if I can get by with just 2. Easiest would probably be all 3 so that I can be sure at least one is not running dhcpd.

The test would consist of:
# Install MAAS on 3 nodes
# Install ipxe to be tested on all three nodes
# Configure subnet for PXE booting to have primary dhcp on first node and secondary on second node
# Provision Pods on each maas node
# Create and commission a VM on each pod
# The VMs on first and second will commission successfully
# The VM on the third node will fail DHCP/PXE
# You can use virt-viewer to view the console of the VM to verify PXE failure

virt stack tests are successful on 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 and 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 from the PPA.
Including various others tests, but mostly related migrations and upgrades between those versions.
Everything fine on that end ...

Thank Vern for outlining your testcase.

@Vern
Doesn't that also have to include some component that does QoS management adding the VLAN-0 tag?

I don't have the systems to verify this case on Bionic and Cosmic.
As I read it from Vern he can only test Bionic at the Customer and is unsure he can test Cosmic the same way.

@MAAS team - Would you have a MAAS tets environment that you can re-use to verify this?
If so could you confirm that you can locally trigger the bug as it is today so that we can rely on it on SRU processing?

@Vern - if above is Nack'ed by the MAAS team could you ask if you could if could get nodes to verify that on Cosmic as well. As I read your test description I think it would be enough to bump the node that is supposed to start the guest to Cosmic+Proposed then - no need to update all systems to Cosmic.

TL;DR:
- I really really tried, but failed to recreate yout case on a single system
- I need your real setup as the VLAN-Tag-0 addition in your case seems to be different
- That makes my request to one of you committing (and checking to be able to) to verify this even more important - see comment #41

Details (of a failed test approach)

# Simple iPXE (without dhcp/tftp/...)

# get some virtualization that gives us a bridge with dhcp
$ sudo apt install uvtool-libvirt apache2
# re-logon for permissions

# copy host kernel there to boot from
$ sudo cp -v /boot/vmlinuz-$(uname -r) /boot/initrd.img-$(uname -r) /var/www/html/
$ sudo chown www-data:www-data /var/www/html/*

# prep qemu to tap on the libvirt bridge
sudo mkdir -p /etc/qemu
$ echo "allow all" | sudo tee /etc/qemu/bridge.conf

# start qemu and right at the start press Ctrl+B to get to the iPXE prompt
$ sudo qemu-system-x86 -cpu host -net nic -net bridge,br=virbr0 -m 1024 -enable-kvm -curses -boot n

# in IPXE then
iPXE> dhcp
# check your dhcp config to work on the expected network
iPXE> show ip
# use your IPs and kernel versions for this
iPXE> kernel http://192.168.122.1/vmlinuz-4.15.0-42-generic
iPXE> initrd http://192.168.122.1/initrd.img-4.15.0-42-generic
iPXE> boot

You can do the same with a config, by putting a ipxe config file at your apache
cat << EOF >/var/www/html/ipxe.config
#!ipxe
kernel http://192.168.122.1/vmlinuz-4.15.0-42-generic
initrd http://192.168.122.1/initrd.img-4.15.0-42-generic
boot
EOF

And then boot with chainbooting:
iPXE> dhcp
iPXE> chain http://192.168.122.1/ipxe.config

# Try to use VLANs here
# Note: There would actually be afull VLAN feature which no one every requested.
# It is off atm, but per https://ipxe.org/cmd/vcreate we could now do like
iPXE> vcreate --tag 42 net0
iPXE> set net0-42/ip 192.168.123.100
iPXE> set net0-42/netmask 255.255.255.0
iPXE> set net0-42/gateway 192.168.123.1

So I wonder about your case, we have a non vlan-aware iPXE that gets 0-tagged packages - is that correct? And most other network stacks would shrug the 0-tag off, but iPXE does not and thinks it is not there (unless you'd config through vcreate maybe).
Lets try to "simulate" that ...

# Add a "normal" VLAN tag 0 interface to the host bridge
$ sudo ip link add name virbr0.0 link virbr0 type vlan id 0
$ sudo ip addr add 192.168.124.1/24 broadcast 192.168.124.255 dev virbr0.0

# On boot we configure iPXE to use that IP range, but intentionally ignoring any VLAN tagging
iPXE> set net0/ip 192.168.124.100
iPXE> set net0/netmask 255.255.255.0
iPXE> set net0/gateway 192.168.124.1
iPXE> set net0/dns 192.168.124.1
iPXE> ifopen net0
iPXE> chain http://192.168.124.1/ipxe.config

Without the fix this blocks on not reaching it
iPXE> chain http://192.168.124.1/ipxe.config
http://192.168.124.1/ipxe.config.................. Connection timed out (http://ipxe.org/4c0a6035)

Installed the 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 from proposed but it fails there as well.
Probably your VLAN-0-TAG case is slightly different to what I had assumed here, but atm I have no way to know where.

Vern Hart (vhart) wrote :

It seems an important component to the failure scenario is the hardware. The customer equipment is a Cisco UCS chassis and the MAAS nodes are blades in that chassis. Even though we cannot find anything in configuration that specifically adds the vlan-0 tag (or priority tag), traffic between the blades goes out one node untagged and shows up tagged on the other node.

Some bugs/discussions around vlan-0 and UCS:

  https://quickview.cloudapps.cisco.com/quickview/bug/CSCuu29425
  https://quickview.cloudapps.cisco.com/quickview/bug/CSCuz83183
  https://bugs.launchpad.net/opencontrail/+bug/1457805
  https://arstechnica.com/civis/viewtopic.php?f=10&t=1442797
  https://lists.linuxfoundation.org/pipermail/fds-dev/2017-May/000710.html
  http://lists.openstack.org/pipermail/openstack-operators/2013-April/002777.html
  https://linux.oracle.com/pls/apex/f?p=102:2:::NO::P2_VC_ID,P2_VERSION:606,1.0

As a note, Cisco seems to suggest it's a bug in Linux, citing these two old posts:

  https://lists.openwall.net/netdev/2013/09/10/30
  https://lists.linuxfoundation.org/pipermail/bridge/2015-July/009630.html

But I'm not convinced they are valid since this vlan-0 tag problem only shows up with this specific Cisco hardware. It seems like there are multiple network related software projects (like ipxe, vpp, probably others) that are forced to deal with the special case of vlan 0 (priority tagging) being added by Cisco UCS switches because Cisco's stance is that they're not adding the tags.

Andres Rodriguez (andreserl) wrote :

FWIW, Cisco documentation here states that priority tagging is enabled by default:

https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html

"Default Settings
VLAN 0 priority tagging is enabled by default."

Thanks for the Details, but that means that even the MAAS Team might
have enough systems, but not the right special HW.
Therefore @Vern could you do the pre-checks with your associated
customer that you can verify the bug on their setup before we push it
as SRU?
As I said, you only need to have Bionic/Cosmic on the target KVM Host
that is supposed to spawn the Pod and currently fails.
IMHO all other components can stay as-is.

Please let me know if that would be possible

I'll upload once I get a confirmation that it will be tested on both releases on the affected HW.

description: updated

NO response yet, we really want to fix it but need some way to verify it.
Sorry to ask once again, but we need to get this unblocked.
@Vern - can you use the setup to verify the two planned uploads for Bionic and Cosmic?

Vern Hart (vhart) wrote :

I was able to verify the fix works for bionic using maas 2.4.3. Am about to install cosmic on the customer hardware to verify the fix there too.

I have run into a maybe-related issue with maas 2.5.0 filed separately here: https://bugs.launchpad.net/maas/+bug/1811021

Ok, once you know you'll be able to verify it on Cosmic as well (by successfully testing from the PPA) let me know.
We will then upload it as a real SRU which needs verification per [1].

[1]: https://wiki.ubuntu.com/StableReleaseUpdates

Vern Hart (vhart) wrote :

I have run a test on cosmic. The test involved MAAS 2.4.3 installed on bionic on 3 of the blades of the UCS chassis in the customer's data center. I installed cosmic, 18.10 on a 4th blade and installed libvirt and qemu-kvm and defined a VM similar to how maas defines VMs. with this xml: https://pastebin.ubuntu.com/p/yCTRGDjx2H/

    $ lsb_release -d
    Description: Ubuntu 18.10

The ipxe-qemu version installed from dist is:

    1.0.0+git-20180124.fbe8c52d-0ubuntu4

The attached screenshot is of the failed pxe boot of the testipxe vm.

Added the ppa:andreserl/maas apt repo and installed ipxe-qemu which gave me version:

    1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1

Note that I had to edit /etc/apt/sources.list.d/andreserl-ubuntu-maas-cosmic.list replacing "cosmic" with "bionic" because that repo doesn't have cosmic packages. And then I had to downgrade the ipxe-qemu because the cosmic version is greater than the one in the fix repo:

    # apt install ipxe-qemu=1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1

Once I jumped through those hoops, I booted the exact same testipxe vm that failed to pxe boot above and it succeeded in getting an IP and commission in MAAS.

Vern Hart (vhart) wrote :

After realizing there are packages in the ci build [1] I installed the following version from there:

    1.0.0+git-20180124.fbe8c52d-0ubuntu4.1

I redefined the testipxe vm from the above test, and it also successfully pxe booted.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3560/+packages

Ok, thanks for the precheck Vern.
Now that we know that you will be able to SRU-verify this I have uploaded it to the SRU queue.

Once accepted by the SRU Team there will be updates here asking for verification. Please do that for Bionic and Cosmic then - if you need any help let us know.
Eventually that will make the fix released into the Ubuntu Archive.

Hello Vern, or anyone else affected,

Accepted ipxe into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ipxe (Ubuntu Cosmic):
status: Incomplete → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Brian Murray (brian-murray) wrote :

Hello Vern, or anyone else affected,

Accepted ipxe into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ipxe (Ubuntu Bionic):
status: Incomplete → Fix Committed
tags: added: verification-needed-bionic

@Vern - if there is any ETA when you will get to the real SRU verifications that were requested by Brian a week ago let us know

@Vern - ping please test to unblock this from bionic-/cosmic-proposed

Vern Hart (vhart) wrote :

Yes, sorry. I will try to test tonight (in a few hours) when I'm back at the hotel. Not that I only have bionic and xenial to test with. I can try to upgrade one of those to cosmic.

Vern Hart (vhart) wrote :

s/Not/Note/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers