VLAN SR-IOV regression for IXGBE driver

Bug #1658491 reported by Rafael David Tinoco
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Fix Released
High
Dan Streetman
Yakkety
Invalid
Undecided
Unassigned
Zesty
Invalid
Undecided
Unassigned

Bug Description

IXGBE driver, for SR-IOV setups, is misbehaving with VLANs.

Description from affected user:

- Create 2 networks (sriov 100 and 102 vlan)

# neutron net-create --provider:physical_network=PHY0 --provider:network_type=vlan --provider:segmentation_id=100 PHY0_vlan_100
# neutron net-create --provider:physical_network=PHY0 --provider:network_type=vlan --provider:segmentation_id=102 PHY0_vlan_102

- Create the subnets:

# neutron subnet-create PHY0_vlan_100 192.168.50.0/24
# neutron subnet-create PHY0_vlan_102 192.168.60.0/24

- Create the neutron ports:

# neutron port-create e450757f-fec6-466e-bb21-a42a2019fe6b --name vlan_100_port1 --vnic-type direct
# neutron port-create 32c468ed-7e1e-4267-bbbf-ec72d33e4454 --name vlan_102_port1 --vnic-type direct

- Boot 2 VMs on 2 different hosts (add only 1 port to each of them + ovs dhcp network):

# nova boot --flavor 789 --image ubuntu --nic net-id=1cf2a512-8963-413d-a745-99e758789c2b --nic port-id=92cf2867-cc0a-4e0d-aa87-14a345cdd708 102_port1_compute6 --key-name mkey --config-drive true --availability-zone nova:compute-0-6.domain.tld --poll
# nova boot --flavor 789 --image ubutnu --nic net-id=1cf2a512-8963-413d-a745-99e758789c2b --nic port-id=baec6fd6-933d-4c58-94b6-44c50405d409 100_port1_compute5 --key-name mkey --config-drive true --availability-zone nova:compute-0-5.domain.tld --poll

- After the VMs booted, configure the VFs:

root@102-port1-compute6:~# ifconfig eth1 192.168.34.6 up
root@100-port1-compute5:~# ifconfig eth1 192.168.34.5 up

If I ping each other it works but it shouldn't work because in this case both of the VMs's interface (host VF) are in different vlans:

- Pinging shouldn't work because the VMs interface (host VF) are in different VLANs.

root@compute-0-5:~# ip link show eth6
8: eth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2140 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:3f:1a:64 brd ff:ff:ff:ff:ff:ff
vf 5 MAC fa:16:3e:f0:2c:e2, vlan 100, spoof checking on, link-state auto

root@compute-0-6:~# ip link show eth5
8: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2140 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:3f:20:88 brd ff:ff:ff:ff:ff:ff
vf 7 MAC fa:16:3e:ce:69:41, vlan 101, spoof checking on, link-state auto

But user can ping both VMs.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

After doing kernel bisection with user:

** Problematic kernel range is in between Ubuntu-4.4.0-0.10..Ubuntu-4.4.0-21.37 **

BUG: https://bugs.launchpad.net/intel/+bug/1536473

Back-ported to Ubuntu-4.4.0-9.24 several ixgbe commits from v4.5-rc1 upstream, among them:

 "ixgbe: Return error on failure to allocate mac_table"
 "ixgbe: Fix SR-IOV VLAN pool configuration"
 "ixgbe: Simplify definitions for regidx and bit in set_vfta"
 "ixgbe: Reduce VT code indent in set_vfta by introducing jump label"
 "ixgbe: Simplify configuration of setting VLVF and VLVFB"
 "ixgbe: Add support for adding/removing VLAN on PF bypassing the VLVF"
 "ixgbe: Reorder search to work from the top down instead of bottom up"
 "ixgbe: Add support for VLAN promiscuous with SR-IOV"
 "ixgbe: Fix VLAN promisc in relation to SR-IOV"
 "ixgbe: Clear stale pool mappings"
 "ixgbe: Clean stale VLANs when changing port VLAN or resetting"
 "ixgbe: Fix bugs in ixgbe_clear_vf_vlans()"

Compiling latest Ubuntu-4.4.0 with commits:

Revert "ixgbe: Return error on failure to allocate mac_table"
Revert "ixgbe: Fix SR-IOV VLAN pool configuration"
Revert "ixgbe: Simplify definitions for regidx and bit in set_vfta"
Revert "ixgbe: Reduce VT code indent in set_vfta by introducing jump label"
Revert "ixgbe: Simplify configuration of setting VLVF and VLVFB"
Revert "ixgbe: Add support for adding/removing VLAN on PF bypassing the VLVF"
Revert "ixgbe: Reorder search to work from the top down instead of bottom up"
Revert "ixgbe: Add support for VLAN promiscuous with SR-IOV"
Revert "ixgbe: Fix VLAN promisc in relation to SR-IOV"
Revert "ixgbe: Clear stale pool mappings"
Revert "ixgbe: Clean stale VLANs when changing port VLAN or resetting"
Revert "ixgbe: Fix bugs in ixgbe_clear_vf_vlans()"

Made the problem to go away.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

n the following bisect:

# bad: [913011e5d0c07a71f3e2f174341e7f45b15eaa65] Revert "ixgbe: Return error on failure to allocate mac_table"
# good: [4291ccca9ac6d20333c349fa6c6d4e9b79d4fabf] UBUNTU: Ubuntu-4.4.0-62.83

Where bad means "fixed" and good means "unfixed" (since I've reverted lots of ixgbe patches), this is the bisect log:

git bisect start '913011e5d' '4291ccca'
# bad: [cd559250820a01813656ef6edef3ee87551836fc] Revert "ixgbe: Reorder search to work from the top down instead of bottom up"
git bisect bad cd559250820a01813656ef6edef3ee87551836fc
# good: [3c7eb56223dd7c4a724384fbcd227036fbe1e349] Revert "ixgbe: Clear stale pool mappings"
git bisect good 3c7eb56223dd7c4a724384fbcd227036fbe1e349
# bad: [c209e5021886f9168d96d55ea7bf10e88813bce1] Revert "ixgbe: Add support for VLAN promiscuous with SR-IOV"
git bisect bad c209e5021886f9168d96d55ea7bf10e88813bce1
# good: [0da45985b746b23c29b754baedf13da4ec59bd77] Revert "ixgbe: Fix VLAN promisc in relation to SR-IOV"
git bisect good 0da45985b746b23c29b754baedf13da4ec59bd77

And this is the reversion that made Ubuntu 4.4 kernel to work as expected:

# first bad commit: [c209e5021886f9168d96d55ea7bf10e88813bce1] Revert "ixgbe: Add support for VLAN promiscuous with SR-IOV"

=================

Meaning that this upstream commit:

commit e1d0a2af2b30f5f0cbce2e4dd438d4da2433b226
Author: Alexander Duyck <email address hidden>
Date: Mon Nov 2 17:10:19 2015 -0800

ixgbe: Fix VLAN promisc in relation to SR-IOV

This patch is a follow-on for enabling VLAN promiscuous and allowing the PF
to add VLANs without adding a VLVF entry. What this patch does is go
through and free the VLVF registers if they are not needed as the VLAN
belongs only to the PF which is the default pool.

Signed-off-by: Alexander Duyck <email address hidden>
Tested-by: Phil Schmitt <email address hidden>
Signed-off-by: Jeff Kirsher <email address hidden>

Backported into Ubuntu 4.4 kernel by:

commit ad740b71fba84e3d17bc0507d2cb696935cd944b
Author: Alexander Duyck <email address hidden>
Date: Mon Nov 2 17:10:19 2015 -0800

ixgbe: Fix VLAN promisc in relation to SR-IOV

BugLink: http://bugs.launchpad.net/bugs/1536473

This patch is a follow-on for enabling VLAN promiscuous and allowing the PF
to add VLANs without adding a VLVF entry. What this patch does is go
through and free the VLVF registers if they are not needed as the VLAN
belongs only to the PF which is the default pool.

Signed-off-by: Alexander Duyck <email address hidden>
Tested-by: Phil Schmitt <email address hidden>
Signed-off-by: Jeff Kirsher <email address hidden>
(cherry picked from commit e1d0a2af2b30f5f0cbce2e4dd438d4da2433b226)
Signed-off-by: Tim Gardner <email address hidden>

From version: Ubuntu-4.4.0-9.24 (and existent in Ubuntu-4.4.0-62.83)
because of https://bugs.launchpad.net/intel/+bug/1536473
These patches were backported likely during the development phase.

Is the problematic one

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Some important commits from upstream that might be related (as a fix):

bf4d67d94c842edf57e3cac2c4dff58a9ce7ac41 ixgbe: Reset interface after enabling SR-IOV
176621c964e9279c42c6b641688360e5cd0baedf ixgbe: fix error handling in TC cls_u32 offload routines
ebd83ad818d2d4502d5e343388000d5dc829b7a8 ixgbe: Fix cls_u32 offload support for fields with masks

18be4fce00fef206dc6f104a6a258b193e9871cf ixgbe: Do not allow PF to add VLVF entry unless it actually needs it
06bb1c39d8be0b2ee60b5bc9384fdac6e19bc270 ixgbe: Avoid adding VLAN 0 twice to VLVF and VFTA

Specially the last 2.

Will provide more details soon:

- If upstream kernel suffers from the issue (so the problem would have to be solved upstream).
- If last 2 commits above solve the issue.

Right now, the certainty is that commit ad740b71fb causes a regression in Ubuntu 4.4 ixgbe driver for SR-IOV functionality.

Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → High
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Upstream kernel "4.10.0-rc3" has been tested and it doesn't have the problem.

Testing if commits:

18be4fce00fef2 ixgbe: Do not allow PF to add VLVF entry unless it actually needs it
06bb1c39d8be0b ixgbe: Avoid adding VLAN 0 twice to VLVF and VFTA

Fix the issue.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

They don't.

Ixge driver fix is in between 4.4.0-XXX and Upstream.

I have prepared:

https://launchpad.net/~inaddy/+archive/ubuntu/lp1658491

To be used as a "hotfix" while the issue is being worked on by @ddstreet.

Changed in linux (Ubuntu Zesty):
assignee: Rafael David Tinoco (inaddy) → Dan Streetman (ddstreet)
Revision history for this message
Dan Streetman (ddstreet) wrote :

Actually, the commit that introduces the problem is the one before ad740b71fba84e3d17bc0507d2cb696935cd944b. The actual offending commit is:

6c39aa938110688cbbd9ad8628f2dc388926e384 ("ixgbe: Add support for VLAN promiscuous with SR-IOV")

which is cherry-picked from upstream commit 16369564915a9777217244678ee6160f8f1acac7.

I've built a PPA with both commits reverted:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1658491

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

This issue may be fixed by this upstream commit:

commit f60439bc21e3337429838e477903214f5bd8277f
Author: Alexander Duyck <email address hidden>
Date: Thu Aug 11 14:51:56 2016 -0700

    ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

    When I was adding the code for enabling VLAN promiscuous mode with SR-IOV
    enabled I had inadvertently left the VLNCTRL.VFE bit unchanged as I has
    assumed there was code in another path that was setting it when we enabled
    SR-IOV. This wasn't the case and as a result we were just disabling VLAN
    filtering for all the VFs apparently.

    Also the previous patches were always clearing CFIEN which was always set
    to 0 by the hardware anyway so I am dropping the redundant bit clearing.

    Fixes: 16369564915a ("ixgbe: Add support for VLAN promiscuous with SR-IOV")
    Signed-off-by: Alexander Duyck <email address hidden>
    Tested-by: Andrew Bowers <email address hidden>
    Signed-off-by: Jeff Kirsher <email address hidden>

Revision history for this message
Dan Streetman (ddstreet) wrote :

> This issue may be fixed by this upstream commit:
> commit f60439bc21e3337429838e477903214f5bd8277f

I verified it is fixed by that commit, thanks Jay.

Changed in linux (Ubuntu Yakkety):
status: New → Invalid
Changed in linux (Ubuntu Xenial):
status: New → Invalid
status: Invalid → In Progress
Changed in linux (Ubuntu Zesty):
status: In Progress → Invalid
Changed in linux (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → High
Changed in linux (Ubuntu Zesty):
importance: High → Undecided
assignee: Dan Streetman (ddstreet) → nobody
Revision history for this message
Dan Streetman (ddstreet) wrote :

The commit that fixes this, f60439bc21e3337429838e477903214f5bd8277f, is already in yakkety and later; this is needed only for Xenial.

Additionally, since the problem commit that this fixes was not included in 4.4 - it was backported into the Xenial kernel via bug 1536473 - the commit that fixes this won't come into Xenial via the normal stable process; it needs to be applied directly.

Revision history for this message
Dan Streetman (ddstreet) wrote :

PPA containing build with commit f60439bc21e3337429838e477903214f5bd8277f backported to Xenial kernel:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1658491

tags: added: sts sts-sru
Revision history for this message
Dan Streetman (ddstreet) wrote :

fyi to those following, kernel-team list submission:
https://lists.ubuntu.com/archives/kernel-team/2017-January/082033.html

tags: added: sts-sponsor
tags: removed: sts-sponsor
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Dan Streetman (ddstreet) wrote :

Verified, with the 4.4.0-62 kernel, two guests with SRIOV interfaces configured for different vlans were able to communicate with each other (incorrect behavior). With kernel 4.4.0-63, the same guests were not able to communicate (correct behavior).

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.0 KiB)

This bug was fixed in the package linux - 4.4.0-63.84

---------------
linux (4.4.0-63.84) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1660704

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * [regression 4.8.0-14 -> 4.8.0-17] keyboard and touchscreen lost on Acer
    Chromebook R11 (LP: #1630238)
    - [Config] CONFIG_PINCTRL_CHERRYVIEW=y

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - SAUCE: HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * VLAN SR-IOV regression for IXGBE driver (LP: #1658491)
    - ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

  * "Out of memory" errors after upgrade to 4.4.0-59 (LP: #1655842)
    - mm, page_alloc: convert alloc_flags to unsigned
    - mm, compaction: change COMPACT_ constants into enum
    - mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED
    - mm, compaction: simplify __alloc_pages_direct_compact feedback interface
    - mm, compaction: distinguish between full and partial COMPACT_COMPLETE
    - mm, compaction: abstract compaction feedback to helpers
    - mm, oom: protect !costly allocations some more
    - mm: consider compaction feedback also for costly allocation
    - mm, oom, compaction: prevent from should_compact_retry looping for ever for
      costly orders
    - mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION
    - mm, oom: prevent premature OOM killer invocation for high order request

  * Backport 3 patches to fix bugs with AIX clients using IBMVSCSI Target Driver
    (LP: #1657194)
    - SAUCE: ibmvscsis: Fix max transfer length
    - SAUCE: ibmvscsis: fix sleeping in interrupt context
    - SAUCE: ibmvscsis: Fix srp_transfer_data fail return code

  * NVMe: adapter is missing after abnormal shutdown followed by quick reboot,
    quirk needed (LP: #1656913)
    - nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

  * Ubuntu 16.10 KVM SRIOV: if enable sriov while ping flood is running ping
    will stop working (LP: #1625318)
    - PCI: Do any VF BAR updates before enabling the BARs
    - PCI: Ignore BAR updates on virtual functions
    - PCI: Update BARs using property bits appropriate for type
    - PCI: Separate VF BAR updates from standard BAR updates
    - PCI: Don't update VF BARs while VF memory space is enabled
    - PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
    - PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
    - PCI: Add comments about ROM BAR updating

  * Linux rtc self test fails in a VM under xenial (LP: #1649718)
    - kvm: x86: Convert ioapic->rtc_status.dest_map to a struct
    - kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map
    - kvm: x86: Check dest_map->vector to match eoi signals for rtc

  * Xenial update to v4.4.44 stable releas...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Louis Bouchard (louis)
tags: added: sts-sru-done
removed: sts-sru
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.