IP reassembly issue on the Linux bridges in Openstack

Bug #1542032 reported by Claude LeFrancois
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

Hi,

Sorry for text diagram. It does not look very well on this screen. Please, copy paste in a decent fixed width text editor.

Thanks,

Claude.

Title: IP reassembly issue on the Linux bridges in Openstack
------------------------------------------------------------

Summary: When the security groups and the Neutron firewall are active in Openstack, each and every VM virtual network interfaces (VNIC) is isolated in a Linux bridge and IP reassembly must be performed in order to allow firewall inspection of the traffic. The reassembled traffic sometimes exceed the capacity of the physical interfaces and the traffic is not forwarded properly.

Linux bridge diagram:
---------------------

----------| |--------------|
   VM | | OVS |
  ------- | -------------- ------- | ----- ----- | ------------ -------
  | TAP |-|-------| QBR bridge |------| QVB |-----|-|QVO| | P |-|----| FW-ADMIN |----| PHY |
  ------- | -------------- ------- | ----- ----- | ------------ -------
          | | |
--------- | |--------------|

Introduction:
-------------

In Openstack, the virtual machine (VM) uses the OpenvSwitch (OVS) for networking purposes. This is not a mandatory setup but this is a common setup in Openstack.

When the Neutron firewall and the security groups are active, each VM VNIC, also called a tap interface, is connected to a Linux bridge. This is the QBR bridge. The QVB interface enables the network communication with OVS. The QVB interface interacts with the QVO interface in OVS.

Security analysis is performed on the Linux bridge. In order to perform adequate traffic inspection, the fragmented traffic has to be re-assembled. The traffic is then forwarded according to Maximum Transmit Unit (MTU) of the interfaces in the bridge.

The MTU values on all the interfaces are set to 65000 bytes. This is where a part of the problem experienced with NFV applications is observed.

Analysis:
---------

As a real life example, the NFV application uses NFS between VMs. NFS is a well known feature in Unix environments. This feature provides network file systems. This is the equivalent of a network drive in the Windows world.

NFS is known to produce large frames. In this example, the VM1 (169.254.4.242) send a larg NFS write instruction to the VM2. The example below shows a 5 KB packet. The traffic is fragmented in several packets as instructed by the VM1 VNIC. This is the desired behavior.

root@node-11:~# tcpdump -e -n -i tap3e79842d-eb host 169.254.1.13

23:46:48.938255 00:80:37:0e:0f:12 > 00:80:37:0e:0b:12, ethertype IPv4 (0x0800), length 1514: 169.254.4.242.3015988240 > 169.254.1.13.2049: 1472 write fh Unknown/01000601B1198A1CB3CC4E1EA3AB0B26017B0AD653620700D59B28C700000000 4863 (4863) bytes @ 229376
23:46:48.938271 00:80:37:0e:0f:12 > 00:80:37:0e:0b:12, ethertype IPv4 (0x0800), length 1514: 169.254.4.242 > 169.254.1.13: ip-proto-17
23:46:48.938279 00:80:37:0e:0f:12 > 00:80:37:0e:0b:12, ethertype IPv4 (0x0800), length 1514: 169.254.4.242 > 169.254.1.13: ip-proto-17
23:46:48.938287 00:80:37:0e:0f:12 > 00:80:37:0e:0b:12, ethertype IPv4 (0x0800), length 590: 169.254.4.242 > 169.254.1.13: ip-proto-17

The same packet is found on the QVB interface in one large frame.

root@node-11:~# tcpdump -e -n -i qvb3e79842d-eb host 169.254.1.13

23:46:48.938322 00:80:37:0e:0f:12 > 00:80:37:0e:0b:12, ethertype IPv4 (0x0800), length 5030: 169.254.4.242.3015988240 > 169.254.1.13.2049: 4988 write fh Unknown/01000601B1198A1CB3CC4E1EA3AB0B26017B0AD653620700D59B28C700000000 4863 (4863) bytes @ 229376

Such large packets cannot cross physical interfaces without being fragmented again if jumbo frames support is not active in the network. Even with jumbo frames, the NFS frame size can easily cross the 9K barrier. NFS frame size up to 32 KB can be observed with NFS over UDP.

For some reasons, this traffic does not seem to be transmitted properly between compute hosts in Openstack.

Further investigations have revealed the large frames are leaving the OVS internal bridge (br-int) in direction of the private bridge (br-prv) using a patch interface in OVS. Once the traffic has reached this point, it uses the "P" interface (i.e.: p_eeee51a2-0) to reach another Linux bridge (br-fw-admin) where the physical interface is connected to. The "P" interface has its MTU set to 65000 and the the physical interface as long as the Linux bridge are set to 1500. A tcpdump analysis reveals the large frames are reaching the "P" interface and the Linux bridge. However, the traffic is not observed on the physical interface. The traffic does not use the DF bit.

This is the reason why the VNF application works fine when all the VMs are located on the same compute host while the NFS application does not work properly when the VMs are using multiple compute hosts. Somehow, when a large frame needs to be sent over to another compute host, either the Linux bridge or the physical interface does not fragment the packet again properly. The information is dropped and lost.

Remedy:
-------

As a workaround, the bridge-nf-call-iptables kernel parameters can be used to disable the bridge netfilter feature. The traffic is not re-assembled and the NFV application works like a charm. However, the traffic is not inspected by the firewall anymore and the security groups functions of the other VNFs/VMs are affected. This is a compute host wide setting and not a per Linux bridge setting.

The modification can be applied in real time but all the other Linux bridges on the compute host are affected.

root@node-11:~# cat /proc/sys/net/bridge/bridge-nf-call-iptables
1

root@node-11:~# echo "0" > /proc/sys/net/bridge/bridge-nf-call-iptables

root@node-11:~# cat /proc/sys/net/bridge/bridge-nf-call-iptables
0

The sysctl command can also be used to control the bridge-nf-call-iptables kernel parameter.

Attachments:
------------

Traffic capture traces showing a 22 KB NFS write operation (nfs-fragment-1frame.cap & nfs-reassembly-1frame.cap)

Expectations:
-------------

- Find why the traffic is not re-fragmented before leaving the compute host
- Fix the issue
- Provide configuration remedy if applicable

Note: ML2 port-security set to False does not help. The anti-spoofing are removed but IP reassembly is still performed although FW inspection is not needed if this feature is present.

Printouts on the compute host (Openstack Kilo):
-----------------------------------------------

root@node-12:~# nova show VM-1.15
+--------------------------------------+---------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+---------------------------------------------------------------------------+
| Internal-1 network | 169.254.4.242 |
| Internal-2 network | 30.30.102.4 |
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | node-11.domain.tld |
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-11.domain.tld |
| OS-EXT-SRV-ATTR:instance_name | instance-000000cc |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2016-01-13T21:14:36.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | True |
| created | 2016-01-13T21:13:58Z |
| flavor | 2vcpu_2048MBmem_1GBdisk (f0083761-fdb1-48bc-8dfd-86fd894d6832) |
| hostId | dab453da6b0bd05902f3d80f6df83d108cfe9704e3d3c0cc903e7628 |
| id | b515db00-067d-4d9a-86be-9dea03c14d03 |
| image | pxeboot_cxp9025898_2r5b03 (0b67c2b1-2370-4b23-91f1-04236b5bba8e) |
| key_name | - |
| metadata | {} |
| name | VM-1.15 |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | ACTIVE |
| tenant_id | 36d1650d2c7f47d4be35a46f3bb6a28e |
| updated | 2016-01-13T21:14:37Z |
| user_id | 928a6b5ff95341f5857c5161df7b6ca1 |
+--------------------------------------+---------------------------------------------------------------------------+

root@node-11:~# brctl show
bridge name bridge id STP enabled interfaces
br-ex 8000.2c44fd7c96cc no eth0.35
       p_ff798dba-0
br-fw-admin 8000.2c44fd7c96cc no eth0
       p_eeee51a2-0
br-mgmt 8000.2c44fd7c96cc no eth0.1526
br-storage 8000.2c44fd7c96cc no eth0.1525
qbr07abdc1e-38 8000.0e00e0133aec no qvb07abdc1e-38
       tap07abdc1e-38
qbr101a4853-a9 8000.66349b3bf77d no qvb101a4853-a9
       tap101a4853-a9
qbr1e3b62fd-80 8000.d6c7c2e452ac no qvb1e3b62fd-80
       tap1e3b62fd-80
qbr26379086-40 8000.1a87ae64580e no qvb26379086-40
       tap26379086-40
qbr2871b06a-fb 8000.b638f3116d76 no qvb2871b06a-fb
       tap2871b06a-fb
qbr29c06538-34 8000.ba1c5aac2726 no qvb29c06538-34
       tap29c06538-34
qbr2efbc02d-33 8000.32e23aa5404e no qvb2efbc02d-33
       tap2efbc02d-33
qbr3298eeb5-a1 8000.667029f958ec no qvb3298eeb5-a1
       tap3298eeb5-a1
qbr3e79842d-eb 8000.e2d3c6aea326 no qvb3e79842d-eb
       tap3e79842d-eb
qbr4805182f-0b 8000.9e3bf559e7c1 no qvb4805182f-0b
       tap4805182f-0b
qbr5160349f-e7 8000.d263b9e4f324 no qvb5160349f-e7
       tap5160349f-e7
qbr534c601a-0c 8000.ca0079ee8e55 no qvb534c601a-0c
       tap534c601a-0c
qbr622ef3b6-a0 8000.625bd7a53dd5 no qvb622ef3b6-a0
       tap622ef3b6-a0
qbr960d7784-82 8000.0642984683ea no qvb960d7784-82
       tap960d7784-82
qbr99faeb13-17 8000.a6476340bb75 no qvb99faeb13-17
       tap99faeb13-17
qbra80a8610-ef 8000.3af49b35beff no qvba80a8610-ef
       tapa80a8610-ef
qbrab3661cd-b2 8000.d6dcaee6a0e7 no qvbab3661cd-b2
       tapab3661cd-b2
qbrabbfad8e-05 8000.4e0f384dbfde no qvbabbfad8e-05
       tapabbfad8e-05
qbrb9bd0dcd-0c 8000.2a4cf0aac6ca no qvbb9bd0dcd-0c
       tapb9bd0dcd-0c
qbrc3a88d15-08 8000.da9fcf716879 no qvbc3a88d15-08
       tapc3a88d15-08
qbrcf4d2014-ea 8000.063f92ac020e no qvbcf4d2014-ea
       tapcf4d2014-ea
qbrd15b94e7-05 8000.5a8a3d70a79d no qvbd15b94e7-05
       tapd15b94e7-05
qbrd3c76f84-6f 8000.66039e089f00 no qvbd3c76f84-6f
       tapd3c76f84-6f
qbrd9d1a7c6-e2 8000.02f220117f85 no qvbd9d1a7c6-e2
       tapd9d1a7c6-e2
qbrdd069c93-ad 8000.a6e25b3b1a82 no qvbdd069c93-ad
       tapdd069c93-ad
qbre3ea8b73-13 8000.0e963b47dbc9 no qvbe3ea8b73-13
       tape3ea8b73-13
qbree5d29b2-75 8000.d257b819b97a no qvbee5d29b2-75
       tapee5d29b2-75
qbrfdd2d84e-e4 8000.02c712bd61bb no qvbfdd2d84e-e4
       tapfdd2d84e-e4
root@node-11:~# virsh dumpxml instance-000000cc
<domain type='kvm' id='131'>
  <name>instance-000000cc</name>
  <uuid>b515db00-067d-4d9a-86be-9dea03c14d03</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="2015.1.1"/>
      <nova:name>VM-1.15</nova:name>
      <nova:creationTime>2016-01-13 21:14:29</nova:creationTime>
      <nova:flavor name="2vcpu_2048MBmem_1GBdisk">
        <nova:memory>2048</nova:memory>
        <nova:disk>1</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>2</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="928a6b5ff95341f5857c5161df7b6ca1">vepc</nova:user>
        <nova:project uuid="36d1650d2c7f47d4be35a46f3bb6a28e">vEPC</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="0b67c2b1-2370-4b23-91f1-04236b5bba8e"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <shares>2048</shares>
  </cputune>
    <sysinfo type='smbios'>
      <system>
        <entry name='manufacturer'>OpenStack Foundation</entry>
        <entry name='product'>OpenStack Nova</entry>
        <entry name='version'>2015.1.1</entry>
        <entry name='serial'>99fa98c8-e7ff-4ece-9155-3a0480f50bfd</entry>
        <entry name='uuid'>b515db00-067d-4d9a-86be-9dea03c14d03</entry>
      </system>
    </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='2' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/b515db00-067d-4d9a-86be-9dea03c14d03/disk'/>
      <backingStore type='file' index='1'>
        <format type='raw'/>
        <source file='/var/lib/nova/instances/_base/5bea60e3738cbc5c2604ec84ce6a1ec6e1debfe6'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/nova/instances/b515db00-067d-4d9a-86be-9dea03c14d03/disk.config'/>
      <backingStore/>
      <target dev='vdz' bus='virtio'/>
      <alias name='virtio-disk25'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:80:37:0e:0f:12'/>
      <source bridge='qbr3e79842d-eb'/>
      <target dev='tap3e79842d-eb'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='00:80:37:0e:0f:12'/>
      <source bridge='qbr960d7784-82'/>
      <target dev='tap960d7784-82'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/var/lib/nova/instances/b515db00-067d-4d9a-86be-9dea03c14d03/console.log'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <serial type='pty'>
      <source path='/dev/pts/6'/>
      <target port='1'/>
      <alias name='serial1'/>
    </serial>
    <console type='file'>
      <source path='/var/lib/nova/instances/b515db00-067d-4d9a-86be-9dea03c14d03/console.log'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='5902' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
      <stats period='10'/>
    </memballoon>
  </devices>
</domain>

root@node-11:~# ifconfig qbr3e79842d-eb
qbr3e79842d-eb Link encap:Ethernet HWaddr e2:d3:c6:ae:a3:26
          inet6 addr: fe80::897:aeff:fee6:5e1b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:65000 Metric:1
          RX packets:52495 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2529458 (2.5 MB) TX bytes:648 (648.0 B)

root@node-11:~# ifconfig qvb3e79842d-eb
qvb3e79842d-eb Link encap:Ethernet HWaddr e2:d3:c6:ae:a3:26
          inet6 addr: fe80::e0d3:c6ff:feae:a326/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:65000 Metric:1
          RX packets:1028373 errors:0 dropped:0 overruns:0 frame:0
          TX packets:929673 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:600674132 (600.6 MB) TX bytes:429962708 (429.9 MB)

root@node-11:~# ifconfig tap3e79842d-eb
tap3e79842d-eb Link encap:Ethernet HWaddr fe:80:37:0e:0f:12
          inet6 addr: fe80::fc80:37ff:fe0e:f12/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:65000 Metric:1
          RX packets:967910 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1028334 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:431302055 (431.3 MB) TX bytes:600737400 (600.7 MB)

root@node-11:~# brctl show qbr3e79842d-eb
bridge name bridge id STP enabled interfaces
qbr3e79842d-eb 8000.e2d3c6aea326 no qvb3e79842d-eb
       tap3e79842d-eb
root@node-11:~# ovs-vsctl show
cd41c9a1-d476-4b48-9d5c-e4c5f18afba5
    Bridge br-floating
        Port "p_ff798dba-0"
            Interface "p_ff798dba-0"
                type: internal
        Port br-floating
            Interface br-floating
                type: internal
    Bridge br-int
        fail_mode: secure
        Port "qvocf4d2014-ea"
            tag: 122
            Interface "qvocf4d2014-ea"
        Port "qvo99faeb13-17"
            tag: 124
            Interface "qvo99faeb13-17"
        Port "qvo29c06538-34"
            tag: 123
            Interface "qvo29c06538-34"
        Port "qvoabbfad8e-05"
            tag: 123
            Interface "qvoabbfad8e-05"
        Port "qvoab3661cd-b2"
            tag: 113
            Interface "qvoab3661cd-b2"
        Port "qvo534c601a-0c"
            tag: 112
            Interface "qvo534c601a-0c"
        Port "qvo07abdc1e-38"
            tag: 112
            Interface "qvo07abdc1e-38"
        Port "qvo622ef3b6-a0"
            tag: 112
            Interface "qvo622ef3b6-a0"
        Port "qvodd069c93-ad"
            tag: 121
            Interface "qvodd069c93-ad"
        Port "qvob9bd0dcd-0c"
            tag: 113
            Interface "qvob9bd0dcd-0c"
        Port "qvo101a4853-a9"
            tag: 113
            Interface "qvo101a4853-a9"
        Port "qvofdd2d84e-e4"
            tag: 115
            Interface "qvofdd2d84e-e4"
        Port "qvo3e79842d-eb"
            tag: 112
            Interface "qvo3e79842d-eb"
        Port "qvod3c76f84-6f"
            tag: 113
            Interface "qvod3c76f84-6f"
        Port "qvod9d1a7c6-e2"
            tag: 121
            Interface "qvod9d1a7c6-e2"
        Port "qvo1e3b62fd-80"
            tag: 113
            Interface "qvo1e3b62fd-80"
        Port "qvoc3a88d15-08"
            tag: 114
            Interface "qvoc3a88d15-08"
        Port "qvo26379086-40"
            tag: 114
            Interface "qvo26379086-40"
        Port "qvo2efbc02d-33"
            tag: 113
            Interface "qvo2efbc02d-33"
        Port "qvo4805182f-0b"
            tag: 115
            Interface "qvo4805182f-0b"
        Port "qvo960d7784-82"
            tag: 113
            Interface "qvo960d7784-82"
        Port br-int
            Interface br-int
                type: internal
        Port "qvoa80a8610-ef"
            tag: 113
            Interface "qvoa80a8610-ef"
        Port "qvod15b94e7-05"
            tag: 112
            Interface "qvod15b94e7-05"
        Port int-br-prv
            Interface int-br-prv
                type: patch
                options: {peer=phy-br-prv}
        Port "qvo5160349f-e7"
            tag: 122
            Interface "qvo5160349f-e7"
        Port "qvo3298eeb5-a1"
            tag: 124
            Interface "qvo3298eeb5-a1"
        Port "qvoee5d29b2-75"
            tag: 112
            Interface "qvoee5d29b2-75"
        Port "qvoe3ea8b73-13"
            tag: 112
            Interface "qvoe3ea8b73-13"
        Port "qvo2871b06a-fb"
            tag: 112
            Interface "qvo2871b06a-fb"
    Bridge br-prv
        Port br-prv
            Interface br-prv
                type: internal
        Port phy-br-prv
            Interface phy-br-prv
                type: patch
                options: {peer=int-br-prv}
        Port "p_eeee51a2-0"
            Interface "p_eeee51a2-0"
                type: internal
    ovs_version: "2.3.1"
root@node-11:~# ifconfig qvo3e79842d-eb
qvo3e79842d-eb Link encap:Ethernet HWaddr da:e1:98:c1:6e:cf
          inet6 addr: fe80::d8e1:98ff:fec1:6ecf/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:65000 Metric:1
          RX packets:931164 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1030766 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:430267581 (430.2 MB) TX bytes:601031366 (601.0 MB)

root@node-11:~# ifconfig p_eeee51a2-0
p_eeee51a2-0 Link encap:Ethernet HWaddr 6e:9d:56:fb:62:a5
          inet6 addr: fe80::6c9d:56ff:fefb:62a5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:65000 Metric:1
          RX packets:86297635 errors:0 dropped:0 overruns:0 frame:0
          TX packets:143277215 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:66475322925 (66.4 GB) TX bytes:35894211276 (35.8 GB)

root@node-11:~# ifconfig br-fw-admin
br-fw-admin Link encap:Ethernet HWaddr 2c:44:fd:7c:9a:a4
          inet addr:10.111.158.103 Bcast:10.111.158.111 Mask:255.255.255.240
          inet6 addr: fe80::2e44:fdff:fe7c:9aa4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:61629535 errors:0 dropped:2958811 overruns:0 frame:0
          TX packets:842703 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:7658578172 (7.6 GB) TX bytes:313894760 (313.8 MB)

root@node-11:~# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 2c:44:fd:7c:9a:a4
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:184932186 errors:88320 dropped:29585 overruns:0 frame:88323
          TX packets:123054385 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:71762107044 (71.7 GB) TX bytes:69565856487 (69.5 GB)
          Interrupt:32

root@node-12:~# nova-manage --version
2015.1.1
root@node-12:~# uname -a
Linux node-12.domain.tld 3.13.0-65-generic #105-Ubuntu SMP Mon Sep 21 18:50:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@node-12:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

Tags: ovs sg-fw
Revision history for this message
Claude LeFrancois (lefranco) wrote :
Revision history for this message
Claude LeFrancois (lefranco) wrote :

More decent diagram in attachment

Changed in neutron:
importance: Undecided → High
status: New → Confirmed
tags: added: linuxbridge
tags: added: needs-attention
tags: added: ovs sg-fw
removed: linuxbridge
Revision history for this message
Sean M. Collins (scollins) wrote :

Thanks for the detailed bug report!

Changed in neutron:
assignee: nobody → Mohammed Ashraf (mohammed-asharaf)
status: Confirmed → In Progress
Changed in neutron:
assignee: Mohammed Ashraf (mohammed-asharaf) → nobody
status: In Progress → Incomplete
importance: High → Undecided
status: Incomplete → New
Revision history for this message
Anseela M M (anseela-m00) wrote :
Download full text (8.8 KiB)

I have created test cases for verifying the bug. please execute these test cases once the fix is done.

Test case id: OPNST_Neutron Bug:1542032_IP reassembly issue on the Linux bridges in Openstack_1
Metric: 30 minutes
Test purpose: Verify that IP segmentation and re-assembly of packets greater PMTU is happening on compute node to perform security analysis. This should be transparent to end points ( VMs) and there should NOT be any packet loss. TCP Packet size ( 1300 Bytes) less that PMTU ( 1500 bytes) is used for testing.
Configuration: Setup contains 2 compute node and 1 controller nodes. Enable Neutron Firewall and Security Groups on Compute Nodes. Physical Interface and Linux Bridge MTU: 1500 all the VMs For all remaining interfaces MTU: 65000 all the VMs
Test tool: Tempest
References:
Applicability: Test can be configured adding and updating the security groups and neutron firewall
Pre-test conditions:
1. OpenStack setup (at least 2 computenode, 1controller and1 network)
2. Launch 2 VMs– one on each compute
Test Description:
1. Enable Neutron Firewall and Security Groups on Compute Nodes.
2. 2. Set Physical Interface and Linux Bridge MTU: 1500 all the VMs
3. For all remaining interfaces MTU: 65000 all the VMs
4. Send a 1300 TCP packet from VM1 to VM2 and vice versa using packet-gen or NFS tool
Result:
1. User should be able to create one VM on each compute node
2. VM1 receive packet ( if send from VM2)
3. VM2 receive packet ( if send from VM1)

Test verdict: TC is pass only if packet from Vm1 is received by Vm2 and vice versa; Test cases fails if there is a packet loss

Test case id: OPNST_Neutron Bug:1542032_IP reassembly issue on the Linux bridges in Openstack_2
Metric: 30 minutes
Test purpose: Verify that IP segmentation and re-assembly of packets greater PMTU is happening on compute node to perform security analysis. This should be transparent to end points (VMs) and there should NOT be any packet loss. TCP Packet size (5000 Bytes) > PMTU (1500 bytes) is used for testing.
Configuration: Setup contains 2 compute node and 1controller nodes. Enable Neutron Firewall and Security Groups on Compute Nodes. Physical Interface and Linux Bridge MTU: 1500 all the VMs For all remaining interfaces MTU: 65000 all the VMs
Test tool: tempest
References:
Applicability: Test can be configured adding and updating the security groups and neutron firewall
Pre-test conditions:
1. OpenStack setup (at least 2 computenode, 1controller and1 network)
2. Launch 2 VMs– one on each compute
Test Description:
1. Enable Neutron Firewall and Security Groups on Compute Nodes.
2. Set Physical Interface and Linux Bridge MTU: 1500
3. For all remaining interfaces MTU: 65000
4. Send a 5000 TCP packet from VM1 to VM2 and vice versa using packet-gen or NFS tool
Result:
1. User should be able to create one VM on each compute node
2. VM1 receive packet ( if send from VM2)
3. VM2 receive packet ( if send from VM1)
Test verdict: TC is pass only if packet from Vm1 is received by Vm2 and vice versa; Test cases fails if there is a packet loss

Test case id: OPNST_Neutron Bug:1542032_IP reassembly issue on the Linux bridges in Openstack_3
Metric: 30 minutes
Test pu...

Read more...

Revision history for this message
Kevin Benton (kevinbenton) wrote :

The cause of this is the 65000 MTU setting. The iptables filtering bridge thinks it is safe to transmit packets that large because of that MTU value.

I assume you set that using network_device_mtu or one of the other MTU options? Please try adjusting the MTU configuration option down to the correct size that matches your physical network. If you are using network_device_mtu, it does not account for encapsulation overhead, so if you are using vxlan, be sure to subtract 50 bytes from whatever your network device is.

Revision history for this message
Claude LeFrancois (lefranco) wrote :

Hi,

I'm really sorry for the lack of follow up on this bug report. I have been involved in many other business activities. As an update, I can confirm Kevin is right. The main recommendation is to make sure all the MTU values are aligned in the cloud and in the network. I should have learned that already based on all my experience in networking! I eventually figured out the MTU was actually a nova setting (network_device_mtu). Due to the fact the cloud I used was in "production", I had to wait for a brand new cloud installation. We aligned the MTU on that new cloud using the network_device_mtu setting and the problem observed in this bug report does not appear any more. We could conclude a misconfiguration in the cloud is responsible for this issue. However, I think the documentation should be aligned to expose this situation clearly and use the right configuration steps. In our case, the cloud vendor was "faulty". The cloud came with such MTU=65000 default setting. I think a documentation section talking about the jumbo frame configuration in Openstack would be appropriate.

Based on my experience, jumbo frames in Openstack requires:

- Switching environment must support jumbo frames
- Physical interfaces on the compute hosts must be configured for jumbo frames
- Nova must be configured with an appropriate network_device_mtu

Also, the network_device_mtu setting is deprecated in Kilo. What is the replacement mechanism? neutron net-create --mtu <value> <net-name>? That does not work in Kilo. What is the status of network_device_mtu with Liberty/Mitaka?

Thanks to all for your attention, involvement and support.

Revision history for this message
Matt Kassawara (ionosphere80) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 180 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
status: New → Incomplete
tags: removed: needs-attention
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
norman shen (jshen28)
Changed in neutron:
status: Expired → Invalid
uchenily (uchenily)
Changed in neutron:
status: Invalid → Confirmed
norman shen (jshen28)
Changed in neutron:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.