[SRU] hw-tc-offload not reliable with VXLAN with OpenStack

Bug #1853592 reported by James Page
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
High
Unassigned
Train
Fix Committed
High
Unassigned
Ussuri
Fix Released
High
Unassigned
openvswitch (Ubuntu)
Fix Released
High
Unassigned
Eoan
Won't Fix
High
Unassigned
Focal
Fix Released
High
Unassigned

Bug Description

[Impact]
Hardware offload VXLAN tunnel flows may not be correctly offloaded and are hence not functional in OpenStack deployments

[Test Case]
Deploy OpenStack configured with suitable network cards for hardware offload of VM interfaces
Boot instances with hardware offloaded interfaces (capability=switchdev)
Instances may or may not get DHCP configured depending on how OpenStack chooses to tag the flows associated with the VXLAN tunnel networking for the cloud.

[Regression Potential]
Low; proposed patch has been tested with OpenStack Stein and has been peer reviewed and landed into all OVS stable branches back to v2.8 and will be included in the next 2.12.x release (which would target Eoan)

[Original Bug Report]
OpenStack Train
OVS 2.12

Linux 5.3 from hwe edge

Connect-X 5 running latest firmware from Mellanox.

Overlay networking configured with VXLAN for project networking.

Some instances successfully DHCP, some don't.

For those that don't:

compute:

11:51:24.926581 IP bond1.2925.node-laveran.maas.58901 > bond1.2925.node-lepaute.maas.4789: VXLAN, flags [I] (0x08), vni 1233
IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:e8:59:f3 (oui Unknown), length 300

gateway:

11:50:44.010864 IP bond1.2925.node-laveran.maas.58901 > bond1.2925.node-lepaute.maas.4789: VXLAN, flags [I] (0x08), vni 1233
IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:e8:59:f3 (oui Unknown), length 300
11:50:44.011196 IP bond1.2925.node-lepaute.maas.55067 > bond1.2925.node-laveran.maas.4789: VXLAN, flags [I] (0x08), vni 1233
IP 172.16.0.2.bootps > 172.16.2.236.bootpc: BOOTP/DHCP, Reply, length 328

The response is send from the gateway unit to the compute unit, but is never seen in the tcpdump.

If hardware encap is disabled on the two underlying ports/the card, traffic flow is seen and all instances correctly DHCP configure on boot.

 sudo ethtool -K enp3s0f0 hw-tc-offload off
 sudo ethtool -K enp3s0f1 hw-tc-offload off

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-5.3.0-23-generic 5.3.0-23.25~18.04.1
ProcVersionSignature: Ubuntu 5.3.0-23.25~18.04.1-generic 5.3.7
Uname: Linux 5.3.0-23-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
Date: Fri Nov 22 12:29:20 2019
ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-hwe-edge
UpgradeStatus: No upgrade log present (probably fresh install)
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 19 16:45 seq
 crw-rw---- 1 root audio 116, 33 Nov 19 16:45 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
 Bus 002 Device 002: ID 8087:8002 Intel Corp.
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
 Bus 001 Device 002: ID 8087:800a Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R630
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.3.0-23-generic root=UUID=2ff5e234-ee62-4bce-8266-cd9aa78c532f ro intel_iommu=on iommu=pt probe_vf=0
ProcVersionSignature: Ubuntu 5.3.0-23.25~18.04.1-generic 5.3.7
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-23-generic N/A
 linux-backports-modules-5.3.0-23-generic N/A
 linux-firmware 1.173.12
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic uec-images
Uname: Linux 5.3.0-23-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 11/08/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.3.4
dmi.board.name: 02C2CP
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.3.4:bd11/08/2016:svnDellInc.:pnPowerEdgeR630:pvr:rvnDellInc.:rn02C2CP:rvrA03:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R630
dmi.product.sku: SKU=NotProvided;ModelName=PowerEdge R630
dmi.sys.vendor: Dell Inc.

Revision history for this message
James Page (james-page) wrote :
affects: linux-signed-hwe-edge (Ubuntu) → linux (Ubuntu)
tags: added: apport-collected
description: updated
Revision history for this message
James Page (james-page) wrote : CRDA.txt

apport information

Revision history for this message
James Page (james-page) wrote : CurrentDmesg.txt

apport information

Revision history for this message
James Page (james-page) wrote : Lspci.txt

apport information

Revision history for this message
James Page (james-page) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
James Page (james-page) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
James Page (james-page) wrote : ProcInterrupts.txt

apport information

Revision history for this message
James Page (james-page) wrote : ProcModules.txt

apport information

Revision history for this message
James Page (james-page) wrote : UdevDb.txt

apport information

Revision history for this message
James Page (james-page) wrote : WifiSyslog.txt

apport information

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
James Page (james-page) wrote : Re: hw-tc-offload not reliable with VXLAN

This may be related to bug 1851819

Revision history for this message
James Page (james-page) wrote :

I re-deployed without bonding (i.e. single port with VF's configured) and I still see the same issue - so bug 1851819 appears to not be related to this issue.

Revision history for this message
James Page (james-page) wrote :

Testing with VM's that are not using hardware offloaded ports i.e. pure openvswitch datapath in kernel.

Revision history for this message
James Page (james-page) wrote :

VM's without offloaded ports work fine with 'hw-tc-offload' enabled on the main PF port.

James Page (james-page)
tags: added: hwoffload
Revision history for this message
James Page (james-page) wrote :

Further Debugging:

tcpdump on representator port for a VM:

12:09:41.933101 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:2d:b5:eb (oui Unknown), length 297
12:09:41.939887 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
12:09:42.042338 IP 172.16.0.2.bootps > 172.16.3.231.bootpc: BOOTP/DHCP, Reply, length 328
12:09:42.042673 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:2d:b5:eb (oui Unknown), length 309
12:09:42.155942 IP6 :: > ff02::1:ff2d:b5eb: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe2d:b5eb, length 32
12:09:42.207925 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
12:09:43.167942 IP6 fe80::f816:3eff:fe2d:b5eb > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
12:09:43.170786 IP6 fe80::f816:3eff:fe2d:b5eb > ip6-allrouters: ICMP6, router solicitation, length 16
12:09:43.202982 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:2d:b5:eb (oui Unknown), length 309
12:09:43.679928 IP6 fe80::f816:3eff:fe2d:b5eb > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
12:09:46.934638 IP6 fe80::f816:3eff:fe2d:b5eb > ip6-allrouters: ICMP6, router solicitation, length 16
12:09:47.575319 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:2d:b5:eb (oui Unknown), length 309

REQUEST/REPLY is seen - however the REPLY must never get to the VM.

As the REPLY is unicast, that gets pushed to an offloaded flow rule so subsequent REPLY packets would not been seen on the hypervisor.

Revision history for this message
James Page (james-page) wrote :

OpenStack related patch into OVS:

https://patchwork.ozlabs.org/patch/1224104/

I've picked this ontop of the eoan packages and it appears to help resolve this issue.

James Page (james-page)
affects: linux (Ubuntu) → openvswitch (Ubuntu)
James Page (james-page)
Changed in openvswitch (Ubuntu Focal):
status: Confirmed → Fix Released
Changed in openvswitch (Ubuntu Eoan):
status: New → Triaged
importance: Undecided → High
Changed in openvswitch (Ubuntu Focal):
importance: Undecided → High
James Page (james-page)
description: updated
summary: - hw-tc-offload not reliable with VXLAN
+ [SRU] hw-tc-offload not reliable with VXLAN with OpenStack
description: updated
description: updated
Revision history for this message
James Page (james-page) wrote :

OVS with patch uploaded to eoan for SRU Team review.

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello James, or anyone else affected,

Accepted openvswitch into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openvswitch/2.12.0-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in openvswitch (Ubuntu Eoan):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-eoan
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello James, or anyone else affected,

Accepted openvswitch into train-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:train-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-train-needed to verification-train-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-train-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-train-needed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (openvswitch/2.12.0-0ubuntu1.1)

All autopkgtests for the newly accepted openvswitch (2.12.0-0ubuntu1.1) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

python-ovsdbapp/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#openvswitch

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Proposed package removed from archive

The version of openvswitch in the proposed pocket of Eoan that was purported to fix this bug report has been removed because the bugs that were to be fixed by the upload were not verified in a timely (105 days) fashion.

tags: removed: verification-needed-eoan
Changed in openvswitch (Ubuntu Eoan):
status: Fix Committed → Won't Fix
tags: removed: verification-needed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (openvswitch/2.12.0-0ubuntu1.1)

All autopkgtests for the newly accepted openvswitch (2.12.0-0ubuntu1.1) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

python-ovsdbapp/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#openvswitch

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.