Incorrect ARP processing when enable_distributed_floating_ip=True

Bug #1905933 reported by Frode Nordahl
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ovn (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
High
Frode Nordahl
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
Enabling `enable-distributed-floating-ip` on a cloud with OVN 20.03 results in loss of external connectivity for instances with floating IPs.

[Test Case]
Launch two instances and assign floating IPs to them. Toggle the `enable-distributed-floating-ip` configuration option and attempt to access a IP address on the internet that is not reachable in the external network L2 broadcast domain.

Observe as the instances will attempt to reach the IP by obtaining it's MAC address through ARP resolution directly rather than applying L3 routing.

The functional test gate of the neutron-api-plugin-ovn charm may be useful for verification.

[Regression Potential]
We have cherry-picked a patch from upstream that reverts the change that introduced the erratic behavior in its entirety. The optimization has later been replaced by a new set of patches which is available in newer versions. As such the regression potential is minimal.

[Original Bug Report]
In a focal-ussuri deployment when enabling `enable-distributed-floating-ip` traffic from instances with FIPs should exit the HV directly and not go through a gateway chassis.

However due to a bug each HV will attempt to do ARP processing locally even for IP addresses not in the external network CIDR.

This results in loss of connectivity for instances with FIPs.

The issue is not present in Groovy with OVN 20.06 and I suspect the issue is fixed by this commit:
https://github.com/ovn-org/ovn/commit/d9ed450713eda62af1bec5009694b2d206c9f435

Related branches

Frode Nordahl (fnordahl)
Changed in ovn (Ubuntu):
status: New → In Progress
status: In Progress → Fix Released
importance: Undecided → High
assignee: nobody → Frode Nordahl (fnordahl)
Changed in ovn (Ubuntu Focal):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Frode Nordahl (fnordahl)
Changed in ovn (Ubuntu):
importance: High → Undecided
assignee: Frode Nordahl (fnordahl) → nobody
Frode Nordahl (fnordahl)
Changed in ovn (Ubuntu Groovy):
status: New → Fix Released
James Page (james-page)
description: updated
Frode Nordahl (fnordahl)
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Frode, or anyone else affected,

Accepted ovn into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ovn/20.03.1-0ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ovn (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Frode Nordahl (fnordahl) wrote :

With a focal deployment as deployed by the OpenStack Charms using the in-distro released packages I can have a instance created with a floating IP and confirm it is able to reach the Ubuntu archive:

$ chmod 600 /tmp/zaza-467221d1afd6/id_rsa_zaza
$ ssh -i /tmp/zaza-467221d1afd6/id_rsa_zaza ubuntu@10.78.95.103
ubuntu@zaza-neutrontests-ins-1:~$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.142) 56(84) bytes of data.
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=1 ttl=60 time=74.3 ms
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=2 ttl=60 time=67.5 ms
^C
--- archive.ubuntu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 67.568/70.938/74.308/3.370 ms

If I then enable Distributed Floatring IP for OVN by issuing the following command and wait for deployment to settle:
$ juju config neutron-api-plugin-ovn enable-distributed-floating-ip=true

I can then repeat the attempt to reach the Ubuntu archive:
ubuntu@zaza-neutrontests-ins-1:~$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.152) 56(84) bytes of data.
^C
--- archive.ubuntu.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2032ms

Dumping packets on the external port of the hypervisor for the instance I can see:
# tcpdump -nevvi enp6s0 arp
tcpdump: listening on enp6s0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:37:54.290577 fa:16:3e:fa:6a:11 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 91.189.88.152 tell 10.78.95.103, length 28

Adding -proposed to ovn-central and hypervisor units:
juju run --application ovn-central 'sudo sh -c "echo deb http://archive.ubuntu.com/ubuntu focal-proposed multiverse restricted main universe >> /etc/apt/sources.list"'
juju run --application ovn-central 'apt update'
juju run --application ovn-central 'sudo apt -y install ovn-central ovn-common'
juju run --application neutron-api 'systemctl restart neutron-server'

for machine in 0 1 2; do juju run --machine $machine 'sudo sh -c "echo deb http://archive.ubuntu.com/ubuntu focal-proposed multiverse restricted main universe >> /etc/apt/sources.list"'&done
for machine in 0 1 2; do juju run --machine $machine 'sudo apt update'&done
for machine in 0 1 2; do juju run --machine $machine 'sudo apt -y install ovn-common ovn-host'&done
wait

I can now repeat the attempt to ping the Ubuntu archive from my instance:
ubuntu@zaza-neutrontests-ins-1:~$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.142) 56(84) bytes of data.
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=1 ttl=60 time=44.5 ms
64 bytes from aerodent.canonical.com (91.189.88.142): icmp_seq=2 ttl=60 time=43.8 ms
^C
--- archive.ubuntu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 43.883/44.197/44.512/0.378 ms

Success!

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Chris Halse Rogers (raof) wrote :

This package failed to build on groovy/armhf and groovy/arm64. Is anyone looking into those failures? This can't be released until they are resolved.

(It also failed on focal/riscv, but it's always failed on focal/riscv so that's not a blocker)

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Hello Chris, thank you for reaching out.

I took the package for a spin on a arm64 host on bos01 and was able to successfully build it there ref. [0].

Is there a way to retry the failed builds for the in-queue package without re-uploading?

0: https://pastebin.ubuntu.com/p/j67dCJmJTJ/

Revision history for this message
Frode Nordahl (fnordahl) wrote :

We have triggered rebuilds now and the arm builds appear to be successful.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ovn - 20.03.1-0ubuntu1.2

---------------
ovn (20.03.1-0ubuntu1.2) focal; urgency=medium

  * d/p/ovn-northd-revert-manage-arp-process-locally-dvr.patch: Cherry pick
    fix for incorrect ARP processing with DVR enabled (LP: #1905933).
  * d/p/ovn-ctl-cluster-db-upgrades.patch: Cherry pick fix for upgrading
    database schema of clustered databases on package upgrade (LP: #1907081)
  * d/p/ovn-ofctrl-predictable-resolution-conflicting-flow-actions-*: Cherry
    pick fixes for predictable resolution for conflicting flow actions.
    (LP: #1906922)

 -- Frode Nordahl <email address hidden> Tue, 12 Jan 2021 11:47:18 +0000

Changed in ovn (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for ovn has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.