s390x OSA bridging causes network issues

Bug #2060939 reported by Phoenyx Cameron
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu
New
Undecided
Unassigned

Bug Description

Enabling linux bridging, via netplan, whilst using an OSA card on the s390x architecture causes large amounts of packets to be received on the sending interface. This behaviour is not seen while using other types of network card on a Z, and is also not seen on any other server architecture (x86 / ppc64).

When performing a ping test, the following can be seen:

- Pings that are unsuccessful (Around 60% of packets sent) are not received by the network switches
- These unsuccessful pings will trigger the following message in a dmesg output: "received packet on vlan2011 with own address as source address"
- tcpdump will also display the packet twice, confirming the dmesg output.

The behaviour is not seen during connections to other addresses on the same VLAN.
Successful pings occur directly after an ARP request is made, however permanently setting the ARP entry results in 100% of packets failing to send

We are using Ubuntu 22.04.4 LTS.

I've attached a zipped folder containing the outputs of the dmesg and tcpdumps - As well as the outputs of a dbginfo command

Revision history for this message
Phoenyx Cameron (phoenyx-c) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/2060939/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-206037 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-04-15 02:52 EDT-------
A reply I sent via email to Phoenix Cameron and others:

I had an look at the data attached to the bugzilla.
I don?t really understand the setup. Why do have 30 bridges on top of 30 VLAN interfaces all connected to the same OSA interface?
Why not 1 bridge with 30 vlan-bridgeports ?

I am wondering whether this could be related to the infamous arp flux issue.
Maybe you want to try setting
net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-04-29 05:15 EDT-------
I got the reply from Phoenyx:
Hi Sandy,

We?ve made the suggested changes to arp_ignore and arp_announce, but this does not seem to have had any effect on the issue unfortunately ? There has been no reduction in the number of packets that giving the ?received packet on vlan2011 with own address as source address? error.

We could certainly look at setting up 1 bridge with multiple bridgeports to test that configuration, it was set up this way to replicate our working Z14 environment. I?m not sure what the original decision was that led to that setup though.

Kind regards,
Phoenyx

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-05-13 05:09 EDT-------
More information from phoenyx:
"Hi Sandy,

We?ve managed to find what was causing the behaviour of packets being received on the source interface to occur; however, we do still have a concern that the behaviour was only present on the OSA card.

We had a device on the network that was attempting to ping the firewall gateways on each VLAN ? Which were configured to drop ICMP packets. This was causing the network to have a solid number of broadcast packets being sent out on each VLAN (2 per second, per VLAN).

The firewalls have been reconfigured to accept ICMP traffic, and now that the broadcast packets are not being sent out, the OSA is working exactly as expected.

However, the OSA cards were the only ones to display issues while this was occurring, all other NICs were functioning despite this network traffic (x86, RoCE, etc)? And we are not sure why only the OSA was displaying that behaviour. Would you happen to have any insights on this at all?

Many thanks,
Nyx"

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-05-13 05:20 EDT-------
Hi Phoenyx,

I'm glad to hear that you were able to fix your issue.

I have to admit I don't get the full picture of your analysis:
The firewall gateways you mention, where are they? Inside KVM guests attached to your OSA card? Or external to IBM Z?
The pinging device where is that? External connected via a physical switch?
What exactly was the unexptected bahaviour of the OSA cards?

In case you want us to investigate further, I would propose to try and recreate
the issue and reduce it to the minimum failing scenario (just 2 endpoints; can we ommit VLAN, bridge, firewall, .. ? The less the better)
And then describe exactly what the expected behaviour is, what the unexpected behaviour is and how to reproduce it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.