OVS plugin tunnel bridges never learn

Bug #1011467 reported by Darragh O'Reilly
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Aaron Rosen

Bug Description

The tunnel bridges never learn which ports the remote VM MACs are on, so they flood every frame onto every gre port. This means every VM frame that needs to go to a VM on another physical node will actually be sent to every other physical node in the mesh.

See diagram https://docs.google.com/drawings/d/1Bsd9myLAfilCzsYIPOYKl5XaeDdILPJIRR0WxOaeuKo/edit
Running the latest build with devstack on ubuntu 12.04. There are 3 compute nodes. Folsom1 is also the controller node running nova-network. There are 2 running VMs: vm1(10.0.0.4) on folsom1(172.241.0.41) and vm2(10.0.0.5) on folsom2(172.241.0.42).

Every packet from vm1->vm2 moves on the tunnel folsom1->folsom2 but is also seen on tunnel folsom1->folsom3. Similarly every packet from vm2->vm1 moves on the tunnel from folsom2->folsom1, but is also seen on folsom2->folsom3.

From a ssh session running on vm1 to vm2, pressing the enter key causes this traffic on folsom3(172.241.0.43):

u1@folsom3:~$ sudo tcpdump -n -i eth1 proto GRE
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
16:34:34.827337 IP 172.241.0.41 > 172.241.0.43: GREv0, key=0x1, length 118: IP 10.0.0.4.49919 > 10.0.0.5.22: Flags [P.], seq 3279504606:3279504650, ack 418478006, win 8372, options [nop,nop,TS val 1987313 ecr 1911275], length 44
16:34:34.872076 IP 172.241.0.42 > 172.241.0.43: GREv0, key=0x1, length 74: IP 10.0.0.5.22 > 10.0.0.4.49919: Flags [.], ack 44, win 7776, options [nop,nop,TS val 1916631 ecr 1987313], length 0
16:34:34.932079 IP 172.241.0.42 > 172.241.0.43: GREv0, key=0x1, length 118: IP 10.0.0.5.22 > 10.0.0.4.49919: Flags [P.], seq 1:45, ack 44, win 7776, options [nop,nop,TS val 1916648 ecr 1987313], length 44
16:34:34.935827 IP 172.241.0.41 > 172.241.0.43: GREv0, key=0x1, length 74: IP 10.0.0.4.49919 > 10.0.0.5.22: Flags [.], ack 45, win 8372, options [nop,nop,TS val 1987334 ecr 1916648], length 0
16:34:34.965302 IP 172.241.0.42 > 172.241.0.43: GREv0, key=0x1, length 118: IP 10.0.0.5.22 > 10.0.0.4.49919: Flags [P.], seq 45:89, ack 44, win 7776, options [nop,nop,TS val 1916653 ecr 1987334], length 44
16:34:34.973799 IP 172.241.0.41 > 172.241.0.43: GREv0, key=0x1, length 74: IP 10.0.0.4.49919 > 10.0.0.5.22: Flags [.], ack 89, win 8372, options [nop,nop,TS val 1987341 ecr 1916653], length 0

u1@folsom1:~$ sudo ovs-vsctl show
b5df6d74-8378-4e50-8b75-969ba3a7469f
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.241.0.42"}
        Port "gre-0"
            Interface "gre-0"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.241.0.43"}
    Bridge br-int
        Port "gw-2d6158fb-55"
            tag: 4
            Interface "gw-2d6158fb-55"
                type: internal
        Port "tap4a4df867-57"
            tag: 4
            Interface "tap4a4df867-57"
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "1.4.0+build0"

u1@folsom1:~$ sudo ovs-ofctl show br-tun
OFPT_FEATURES_REPLY (xid=0x1): ver:0x1, dpid:00009e9e10bf184e
n_tables:255, n_buffers:256
features: capabilities:0xc7, actions:0xfff
 1(patch-int): addr:9e:f3:56:b9:e2:0b
     config: 0
     state: 0
 2(gre-0): addr:d6:a3:5c:a2:0e:c3
     config: 0
     state: 0
 3(gre-1): addr:92:8e:df:30:b9:86
     config: 0
     state: 0
 LOCAL(br-tun): addr:9e:9e:10:bf:18:4e
     config: PORT_DOWN
     state: LINK_DOWN
OFPT_GET_CONFIG_REPLY (xid=0x3): frags=normal miss_send_len=0

u1@folsom1:~$ sudo ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=2977.562s, table=0, n_packets=198, n_bytes=42502, priority=3,tun_id=0x1 actions=mod_vlan_vid:4,output:1
 cookie=0x0, duration=2977.703s, table=0, n_packets=199, n_bytes=40780, priority=4,in_port=1,dl_vlan=4 actions=strip_vlan,set_tunnel:0x1,NORMAL
 cookie=0x0, duration=6234.7s, table=0, n_packets=3, n_bytes=970, priority=1 actions=drop

The MAC table has just the macs for vm1 and the gateway tap from the integration bridge on the patch-int port. It never gets an entry for vm2.

u1@folsom1:~$ sudo ovs-appctl fdb/show br-tun
 port VLAN MAC Age
    1 0 fa:16:3e:50:2c:8f 30
    1 0 fa:16:3e:3d:ff:74 16

------------------------------------------------

u1@folsom2:~$ sudo ovs-vsctl show
3eee2d04-0d51-48e1-825f-092ca12c0ff6
    Bridge br-int
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "tapafca0da5-30"
            tag: 5
            Interface "tapafca0da5-30"
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-0"
            Interface "gre-0"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.241.0.41"}
        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.241.0.43"}
    ovs_version: "1.4.0+build0"

u1@folsom2:~$ sudo ovs-ofctl show br-tun
OFPT_FEATURES_REPLY (xid=0x1): ver:0x1, dpid:00008286d3a57842
n_tables:255, n_buffers:256
features: capabilities:0xc7, actions:0xfff
 1(patch-int): addr:36:1b:25:17:31:0b
     config: 0
     state: 0
 2(gre-0): addr:1a:ed:78:ed:d9:00
     config: 0
     state: 0
 3(gre-1): addr:62:1e:71:9f:82:f7
     config: 0
     state: 0
 LOCAL(br-tun): addr:82:86:d3:a5:78:42
     config: PORT_DOWN
     state: LINK_DOWN
OFPT_GET_CONFIG_REPLY (xid=0x3): frags=normal miss_send_len=0

u1@folsom2:~$ sudo ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=3178.517s, table=0, n_packets=207, n_bytes=42252, priority=3,tun_id=0x1 actions=mod_vlan_vid:5,output:1
 cookie=0x0, duration=3178.923s, table=0, n_packets=206, n_bytes=45370, priority=4,in_port=1,dl_vlan=5 actions=strip_vlan,set_tunnel:0x1,NORMAL
 cookie=0x0, duration=6952.312s, table=0, n_packets=15, n_bytes=3106, priority=1 actions=drop

u1@folsom2:~$ sudo ovs-appctl fdb/show br-tun
 port VLAN MAC Age
    1 0 fa:16:3e:65:77:b5 26

---------------------------------------------------

u1@folsom3:~$ sudo ovs-vsctl show
293333a5-98b8-470f-8805-a13e7cd7d6f8
    Bridge br-int
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tun
        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.241.0.42"}
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-0"
            Interface "gre-0"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.241.0.41"}
    ovs_version: "1.4.0+build0"

u1@folsom3:~$ sudo ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=4285.438s, table=0, n_packets=530, n_bytes=113476, priority=1 actions=drop

dan wendlandt (danwent)
Changed in quantum:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → dan wendlandt (danwent)
Revision history for this message
dan wendlandt (danwent) wrote :

Thanks for the detailed report. I've confirmed this in our current setup, and think I understand why this started happening, but I'm not totally sure, as my initial attempt at fixing it did not work as expected either :)

Will keep looking.

Revision history for this message
dan wendlandt (danwent) wrote :

Ok, I've tested a basic fix for this. The problem is that it interacts with an OVS bug that is not fixed Precise, so I need to find a good way to work around that.

Revision history for this message
dan wendlandt (danwent) wrote :

aaron will be handling this fix. hoping to get it in for F-2

Changed in quantum:
assignee: dan wendlandt (danwent) → Aaron Rosen (arosen)
milestone: none → folsom-2
dan wendlandt (danwent)
Changed in quantum:
status: Confirmed → In Progress
Revision history for this message
dan wendlandt (danwent) wrote :

Aaron, let's post this patch soon.

We're holding off on merging, as the change risks triggering an OVS bug, the fix for which still haven't been pulled into Ubuntu 12.04.

Changed in quantum:
milestone: folsom-2 → folsom-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/9416

Revision history for this message
dan wendlandt (danwent) wrote :

Still waiting on the Ubuntu folks I guess. We'll have to move this out of F3 and into RC1

Changed in quantum:
milestone: folsom-3 → folsom-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (master)

Reviewed: https://review.openstack.org/9416
Committed: http://github.com/openstack/quantum/commit/d2b58ba48641139e3aa9b9e51bcd5396aca09510
Submitter: Jenkins
Branch: master

commit d2b58ba48641139e3aa9b9e51bcd5396aca09510
Author: Aaron Rosen <email address hidden>
Date: Thu Jul 5 19:31:51 2012 -0400

    OVS plugin tunnel bridges never learn

    This patch installs a flow_mod to handle each vm
    with the normal action which allows OVS to do mac learning.
    Fixes bug 1011467

    Change-Id: Ib6500813d4111ae42675459fac64dfb2e9c40d91

Changed in quantum:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in quantum:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in quantum:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.