ovs VXLAN over IPv6 conflicts with linux native VXLAN over IPv4 using standard port

Bug #1846507 reported by Radosław Piliszek on 2019-10-03
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Undecided
Unassigned
neutron
Undecided
Unassigned

Bug Description

This has been observed while testing CI IPv6.

ovs-agent tries to run VXLAN over IPv6 (default port so 4789)

linux provides network for CI hosts via native VXLAN (kernel interface of the vxlan type) over IPv4 using standard port (the 4789)
kernel bound to IPv4-only UDP 0.0.0.0:4789

ovs-agent:

2019-10-02 19:42:55.073 6 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-103d9f69-76c7-46bc-aeac-81b22af3e6b2 - - - - -] Failed to set-up vxlan tunnel port to fd::3

vswitchd:

2019-10-02T19:42:55.017Z|00059|dpif_netlink|WARN|Failed to create vxlan-srtluqpm3 with rtnetlink: Address already in use
2019-10-02T19:42:55.053Z|00060|dpif|WARN|system@ovs-system: failed to add vxlan-srtluqpm3 as port: Address already in use
2019-10-02T19:42:55.053Z|00061|bridge|WARN|could not add network device vxlan-srtluqpm3 to ofproto (Address already in use)

For some reason this conflict does *not* arise when ovs is using IPv4 tunnels (kinda counter-intuitively).

Workarounded by using different port. This has no real life meaning (IMHO) but is undoubtedly an interesting phenomenon.

Environment description:
Distro: Ubuntu 18.04 (Bionic)
Kernel: 4.15.0-65.74 (generic on x86_64)
Neutron release: current master
OVS: 2.11.0-0ubuntu2~cloud0 (from UCA - Ubuntu Cloud Archive for Train on Bionic)

Radosław Piliszek (yoctozepto) wrote :

Notifying neutron so that they may shed some light on this observation.

Changed in kolla-ansible:
status: Triaged → Opinion
importance: Wishlist → Undecided
description: updated
Ryan Tidwell (ryan-tidwell) wrote :

If it's available could you maybe share some netstat output snippets showing the UDP sockets in use, linux bridge settings, OVS version, and any relevant neutron config snippets?

Radosław Piliszek (yoctozepto) wrote :

I added environment description to description, will attach relevant configs and command outputs.

Linux kernel VXLAN interface was created as follows (addresses given as example):
ip link add vxlan0 type vxlan id 10001 local 10.0.0.1 dstport 4789

^ to reiterate: changing that 4789 to e.g. 4790 fixes IPv6 tunnel, IPv4 is never broken

description: updated
summary: - ovs VXLAN over IPv6 conflicts with linux bridge VXLAN over IPv4 using
+ ovs VXLAN over IPv6 conflicts with linux native VXLAN over IPv4 using
standard port
description: updated
Radosław Piliszek (yoctozepto) wrote :
Radosław Piliszek (yoctozepto) wrote :
Radosław Piliszek (yoctozepto) wrote :
Radosław Piliszek (yoctozepto) wrote :
Radosław Piliszek (yoctozepto) wrote :

Launchpad hid the file names but they are downloaded just fine. These two config files are fed to the agent.

Radosław Piliszek (yoctozepto) wrote :

Just noticed I left IPv4 multicast group address in there. Will check without it for completeness. I get no warnings regarding it so it probably was not even used here.

Ryan Tidwell (ryan-tidwell) wrote :

Is it possible to check the value of /proc/sys/net/ipv6/bindv6only? I can see the initial VTEP bound to UDP 0.0.0.0:4789. I'm wondering if OVS is attempting to use [::]:4789, which could conflict when /proc/sys/net/ipv6/bindv6only is 0. When bindv6only is 0, I believe that tells the system to use bind to open a "dual-stack" socket if you will, meaning we could be getting a conflict with the IPv4 aspect of the listening socket OVS is requesting. I'm wondering whether simply setting /proc/sys/net/ipv6/bindv6only to 1 would allow this setup to work.

Ryan Tidwell (ryan-tidwell) wrote :

"When bindv6only is 0, I believe that tells the system to use bind to open a "dual-stack" socket if you will"

should read:

"When bindv6only is 0, I believe that tells the system to bind to a "dual-stack" socket if you will...."

(changed kolla-ansible params so that launchpad does not hide it from me)

Changed in kolla-ansible:
status: Opinion → Triaged
importance: Undecided → Wishlist

Ryan, sure, I log every detail that comes to my mind, here it is:
net.ipv6.bindv6only = 0
(as expected)

Will try setting this to 1. I believe OVS should set the socket to v6only but most likely it does not. ;-)

Still it does not explain why IPv4 tunnels work. Does neutron/ovs see the conflict upfront and modifies the port? Or is able to reuse kernel IPv4 binding?

Ryan Tidwell (ryan-tidwell) wrote :

From a running compute node I have access to:

# ss -lun | grep 4789
UNCONN 0 0 *:4789 *:*
UNCONN 0 0 :::4789 :::*

I see what appears to be OVS listening on both IPv4 and IPv6, even in a single-stack IPv4 deployment. I need to spend a little more time with this to really get a handle on what is happening.

Neither helped IPv6: nor removing the bogus multicast address (initially expected) nor setting bindv6only (expected after your last comment).

So the mystery still waits to be uncovered. :-)

tags: added: ipv6
Changed in neutron:
status: New → Opinion

It turns out we are hitting this also in IPv4 jobs. For some reason not on all nodes though...

Changed in kolla-ansible:
status: Triaged → Opinion
importance: Wishlist → Undecided
milestone: 9.0.0 → none
assignee: Radosław Piliszek (yoctozepto) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers